[D] Flowchart of 2023 AI Research Internship Search as a US PhD Student
submitted by /u/Dependent_Use_8436
[link] [comments]
The use of autonomous robots for assistance tasks in hospitals has the
potential to free up qualified staff and im-prove patient care. However, the
ubiquity of deformable and transparent objects in hospital settings poses
signif-icant challenges to vision-based perception systems. We present
EfficientPPS, a neural architecture for part-aware panoptic segmentation that
provides robots with semantically rich visual information for grasping and
ma-nipulation tasks. We also present an unsupervised data collection and
labelling method to reduce the need for human involvement in the training
process. EfficientPPS is evaluated on a dataset containing real-world hospital
objects and demonstrated to be robust and efficient in grasping transparent
transfusion bags with a collaborative robot arm.
( 2
min )
In this paper, we study the collaborative learning model, which concerns the
tradeoff between parallelism and communication overhead in multi-agent
multi-armed bandits. For regret minimization in multi-armed bandits, we present
the first set of tradeoffs between the number of rounds of communication among
the agents and the regret of the collaborative learning process.
( 2
min )
We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D
radiance fields parameterized by 3D Gaussian primitives from pairs of images.
Our model features real-time and memory-efficient rendering for scalable
training as well as fast 3D reconstruction at inference time. To overcome local
minima inherent to sparse and locally supported representations, we predict a
dense probability distribution over 3D and sample Gaussian means from that
probability distribution. We make this sampling operation differentiable via a
reparameterization trick, allowing us to back-propagate gradients through the
Gaussian splatting representation. We benchmark our method on wide-baseline
novel view synthesis on the real-world RealEstate10k and ACID datasets, where
we outperform state-of-the-art light field transformers and accelerate
rendering by 2.5 orders of magnitude while reconstructing an interpretable and
editable 3D radiance field.
( 2
min )
To better understand the output of deep neural networks (DNN), attribution
based methods have been an important approach for model interpretability, which
assign a score for each input dimension to indicate its importance towards the
model outcome. Notably, the attribution methods use the axioms of sensitivity
and implementation invariance to ensure the validity and reliability of
attribution results. Yet, the existing attribution methods present challenges
for effective interpretation and efficient computation. In this work, we
introduce MFABA, an attribution algorithm that adheres to axioms, as a novel
method for interpreting DNN. Additionally, we provide the theoretical proof and
in-depth analysis for MFABA algorithm, and conduct a large scale experiment.
The results demonstrate its superiority by achieving over 101.5142 times faster
speed than the state-of-the-art attribution algorithms. The effectiveness of
MFABA is thoroughly evaluated through the statistical analysis in comparison to
other methods, and the full implementation package is open-source at:
https://github.com/LMBTough/MFABA
( 2
min )
This study explores the application of anomaly detection (AD) methods in
imbalanced learning tasks, focusing on fraud detection using real online credit
card payment data. We assess the performance of several recent AD methods and
compare their effectiveness against standard supervised learning methods.
Offering evidence of distribution shift within our dataset, we analyze its
impact on the tested models' performances. Our findings reveal that LightGBM
exhibits significantly superior performance across all evaluated metrics but
suffers more from distribution shifts than AD methods. Furthermore, our
investigation reveals that LightGBM also captures the majority of frauds
detected by AD methods. This observation challenges the potential benefits of
ensemble methods to combine supervised, and AD approaches to enhance
performance. In summary, this research provides practical insights into the
utility of these techniques in real-world scenarios, showing LightGBM's
superiority in fraud detection while highlighting challenges related to
distribution shifts.
( 2
min )
Bayesian optimization (BO) is a sample-efficient method and has been widely
used for optimizing expensive black-box functions. Recently, there has been a
considerable interest in BO literature in optimizing functions that are
affected by context variable in the environment, which is uncontrollable by
decision makers. In this paper, we focus on the optimization of functions'
expectations over continuous context variable, subject to an unknown
distribution. To address this problem, we propose two algorithms that employ
kernel density estimation to learn the probability density function (PDF) of
continuous context variable online. The first algorithm is simpler, which
directly optimizes the expectation under the estimated PDF. Considering that
the estimated PDF may have high estimation error when the true distribution is
complicated, we further propose the second algorithm that optimizes the
distributionally robust objective. Theoretical results demonstrate that both
algorithms have sub-linear Bayesian cumulative regret on the expectation
objective. Furthermore, we conduct numerical experiments to empirically
demonstrate the effectiveness of our algorithms.
( 2
min )
Policy gradient methods enjoy strong practical performance in numerous tasks
in reinforcement learning. Their theoretical understanding in multiagent
settings, however, remains limited, especially beyond two-player competitive
and potential Markov games. In this paper, we develop a new framework to
characterize optimistic policy gradient methods in multi-player Markov games
with a single controller. Specifically, under the further assumption that the
game exhibits an equilibrium collapse, in that the marginals of coarse
correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to
stationary $\epsilon$-NE in $O(1/\epsilon^2)$ iterations, where $O(\cdot)$
suppresses polynomial factors in the natural parameters of the game. Such an
equilibrium collapse is well-known to manifest itself in two-player zero-sum
Markov games, but also occurs even in a class of multi-player Markov games with
separable interactions, as established by recent work. As a result, we bypass
known complexity barriers for computing stationary NE when either of our
assumptions fails. Our approach relies on a natural generalization of the
classical Minty property that we introduce, which we anticipate to have further
applications beyond Markov games.
( 2
min )
Tabular data analysis is crucial in various fields, and large language models
show promise in this area. However, current research mostly focuses on
rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like
forecasting and chart generation. To address this gap, we developed the
Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond
the SQL-compatible operations and require more in-depth analysis. We also
develop five innovative and effective annotation methods, harnessing the
capabilities of large language models to enhance data quality and quantity.
Additionally, we include unclear queries that resemble real-world user
questions to test how well models can understand and tackle such challenges.
Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five
state-of-the-art models using three different metrics and the results show that
our benchmark presents introduces considerable challenge in the field of
tabular data analysis, paving the way for more advanced research opportunities.
( 2
min )
Recently multi-armed bandit problem arises in many real-life scenarios where
arms must be sampled in batches, due to limited time the agent can wait for the
feedback. Such applications include biological experimentation and online
marketing. The problem is further complicated when the number of arms is large
and the number of batches is small. We consider pure exploration in a batched
multi-armed bandit problem. We introduce a general linear programming framework
that can incorporate objectives of different theoretical settings in best arm
identification. The linear program leads to a two-stage algorithm that can
achieve good theoretical properties. We demonstrate by numerical studies that
the algorithm also has good performance compared to certain UCB-type or
Thompson sampling methods.
( 2
min )
Uncertainty estimation is a key issue when considering the application of
deep neural network methods in science and engineering. In this work, we
introduce a novel algorithm that quantifies epistemic uncertainty via Monte
Carlo sampling from a tempered posterior distribution. It combines the well
established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based
optimization using Adam and leverages a prolate proposal distribution, to
efficiently draw from the posterior. We prove that the constructed chain admits
the Gibbs posterior as an invariant distribution and converges to this Gibbs
posterior in total variation distance. Numerical evaluations are postponed to a
first revision.
( 2
min )
Pufferfish privacy is a flexible generalization of differential privacy that
allows to model arbitrary secrets and adversary's prior knowledge about the
data. Unfortunately, designing general and tractable Pufferfish mechanisms that
do not compromise utility is challenging. Furthermore, this framework does not
provide the composition guarantees needed for a direct use in iterative machine
learning algorithms. To mitigate these issues, we introduce a R\'enyi
divergence-based variant of Pufferfish and show that it allows us to extend the
applicability of the Pufferfish framework. We first generalize the Wasserstein
mechanism to cover a wide range of noise distributions and introduce several
ways to improve its utility. We also derive stronger guarantees against
out-of-distribution adversaries. Finally, as an alternative to composition, we
prove privacy amplification results for contractive noisy iterations and
showcase the first use of Pufferfish in private convex optimization. A common
ingredient underlying our results is the use and extension of shift reduction
lemmas.
( 2
min )
Convex clustering is a modern method with both hierarchical and $k$-means
clustering characteristics. Although convex clustering can capture complex
clustering structures hidden in data, the existing convex clustering algorithms
are not scalable to large data sets with sample sizes greater than several
thousands. Moreover, it is known that convex clustering sometimes fails to
produce a complete hierarchical clustering structure. This issue arises if
clusters split up or the minimum number of possible clusters is larger than the
desired number of clusters. In this paper, we propose convex clustering through
majorization-minimization (CCMM) -- an iterative algorithm that uses cluster
fusions and a highly efficient updating scheme derived using diagonal
majorization. Additionally, we explore different strategies to ensure that the
hierarchical clustering structure terminates in a single cluster. With a
current desktop computer, CCMM efficiently solves convex clustering problems
featuring over one million objects in seven-dimensional space, achieving a
solution time of 51 seconds on average.
( 2
min )
This short note describes and proves a connectedness property which was
introduced in Blocher et al. [2023] in the context of data depth functions for
partial orders. The connectedness property gives a structural insight into
union-free generic sets. These sets, presented in Blocher et al. [2023], are
defined by using a closure operator on the set of all partial orders which
naturally appears within the theory of formal concept analysis. In the language
of formal concept analysis, the property of connectedness can be vividly
proven. However, since within Blocher et al. [2023] we did not discuss formal
concept analysis, we outsourced the proof to this note.
( 2
min )
This paper considers the epistemic justification for a simplicity preference
in inductive inference that may be obtained from the machine learning framework
of statistical learning theory. Uniting elements from both earlier arguments
suggesting and rejecting such a justification, the paper spells out a qualified
means-ends and model-relative justificatory argument, built on statistical
learning theory's central mathematical learning guarantee for the method of
empirical risk minimization.
( 2
min )
Generating counterfactual explanations is one of the most effective
approaches for uncovering the inner workings of black-box neural network models
and building user trust. While remarkable strides have been made in generative
modeling using diffusion models in domains like vision, their utility in
generating counterfactual explanations in structured modalities remains
unexplored. In this paper, we introduce Structured Counterfactual Diffuser or
SCD, the first plug-and-play framework leveraging diffusion for generating
counterfactual explanations in structured data. SCD learns the underlying data
distribution via a diffusion model which is then guided at test time to
generate counterfactuals for any arbitrary black-box model, input, and desired
prediction. Our experiments show that our counterfactuals not only exhibit high
plausibility compared to the existing state-of-the-art but also show
significantly better proximity and diversity.
( 2
min )
Recent works have shown that physics-inspired architectures allow the
training of deep graph neural networks (GNNs) without oversmoothing. The role
of these physics is unclear, however, with successful examples of both
reversible (e.g., Hamiltonian) and irreversible (e.g., diffusion) phenomena
producing comparable results despite diametrically opposed mechanisms, and
further complications arising due to empirical departures from mathematical
theory. This work presents a series of novel GNN architectures based upon
structure-preserving bracket-based dynamical systems, which are provably
guaranteed to either conserve energy or generate positive dissipation with
increasing depth. It is shown that the theoretically principled framework
employed here allows for inherently explainable constructions, which
contextualize departures from theory in current architectures and better
elucidate the roles of reversibility and irreversibility in network
performance.
( 2
min )
Accurate land use maps, describing the territory from an anthropic
utilisation point of view, are useful tools for land management and planning.
To produce them, the use of optical images alone remains limited. It is
therefore necessary to make use of several heterogeneous sources, each carrying
complementary or contradictory information due to their imperfections or their
different specifications. This study compares two different approaches i.e. a
pre-classification and a post-classification fusion approach for combining
several sources of spatial data in the context of land use classification. The
approaches are applied on authoritative land use data located in the Gers
department in the southwest of France. Pre-classification fusion, while not
explicitly modeling imperfections, has the best final results, reaching an
overall accuracy of 97% and a macro-mean F1 score of 88%.
( 2
min )
Existing algorithms for reinforcement learning from human feedback (RLHF) can
incentivize responses at odds with preferences because they are based on models
that assume independence of irrelevant alternatives (IIA). The perverse
incentives induced by IIA give rise to egregious behavior when innovating on
query formats or learning algorithms.
( 2
min )
We tackle the problem of sampling from intractable high-dimensional density
functions, a fundamental task that often appears in machine learning and
statistics. We extend recent sampling-based approaches that leverage controlled
stochastic processes to model approximate samples from these target densities.
The main drawback of these approaches is that the training objective requires
full trajectories to compute, resulting in sluggish credit assignment issues
due to use of entire trajectories and a learning signal present only at the
terminal time. In this work, we present Diffusion Generative Flow Samplers
(DGFS), a sampling-based framework where the learning process can be tractably
broken down into short partial trajectory segments, via parameterizing an
additional "flow function". Our method takes inspiration from the theory
developed for generative flow networks (GFlowNets), allowing us to make use of
intermediate learning signals. Through various challenging experiments, we
demonstrate that DGFS achieves more accurate estimates of the normalization
constant than closely-related prior methods.
( 2
min )
Through this paper, we introduce a novel driver cognitive load assessment
dataset, CL-Drive, which contains Electroencephalogram (EEG) signals along with
other physiological signals such as Electrocardiography (ECG) and Electrodermal
Activity (EDA) as well as eye tracking data. The data was collected from 21
subjects while driving in an immersive vehicle simulator, in various driving
conditions, to induce different levels of cognitive load in the subjects. The
tasks consisted of 9 complexity levels for 3 minutes each. Each driver reported
their subjective cognitive load every 10 seconds throughout the experiment. The
dataset contains the subjective cognitive load recorded as ground truth. In
this paper, we also provide benchmark classification results for different
machine learning and deep learning models for both binary and ternary label
distributions. We followed 2 evaluation criteria namely 10-fold and
leave-one-subject-out (LOSO). We have trained our models on both hand-crafted
features as well as on raw data.
( 3
min )
The effectiveness of digital treatments can be measured by requiring patients
to self-report their state through applications, however, it can be
overwhelming and causes disengagement. We conduct a study to explore the impact
of gamification on self-reporting. Our approach involves the creation of a
system to assess cognitive load (CL) through the analysis of
photoplethysmography (PPG) signals. The data from 11 participants is utilized
to train a machine learning model to detect CL. Subsequently, we create two
versions of surveys: a gamified and a traditional one. We estimate the CL
experienced by other participants (13) while completing surveys. We find that
CL detector performance can be enhanced via pre-training on stress detection
tasks. For 10 out of 13 participants, a personalized CL detector can achieve an
F1 score above 0.7. We find no difference between the gamified and non-gamified
surveys in terms of CL but participants prefer the gamified version.
( 3
min )
Multi-relational clustering is a challenging task due to the fact that
diverse semantic information conveyed in multi-layer graphs is difficult to
extract and fuse. Recent methods integrate topology structure and node
attribute information through graph filtering. However, they often use a
low-pass filter without fully considering the correlation among multiple
graphs. To overcome this drawback, we propose to learn a graph filter motivated
by the theoretical analysis of Barlow Twins. We find that input with a negative
semi-definite inner product provides a lower bound for Barlow Twins loss, which
prevents it from reaching a better solution. We thus learn a filter that yields
an upper bound for Barlow Twins. Afterward, we design a simple clustering
architecture and demonstrate its state-of-the-art performance on four benchmark
datasets.
( 2
min )
Stochastic optimal control of dynamical systems is a crucial challenge in
sequential decision-making. Recently, control-as-inference approaches have had
considerable success, providing a viable risk-sensitive framework to address
the exploration-exploitation dilemma. Nonetheless, a majority of these
techniques only invoke the inference-control duality to derive a modified risk
objective that is then addressed within a reinforcement learning framework.
This paper introduces a novel perspective by framing risk-sensitive stochastic
control as Markovian score climbing under samples drawn from a conditional
particle filter. Our approach, while purely inference-centric, provides
asymptotically unbiased estimates for gradient-based policy optimization with
optimal importance weighting and no explicit value function learning. To
validate our methodology, we apply it to the task of learning neural
non-Gaussian feedback policies, showcasing its efficacy on numerical benchmarks
of stochastic dynamical systems.
( 2
min )
Despite the great popularity of virtual screening of existing compound
libraries, the search for new potential drug candidates also takes advantage of
generative protocols, where new compound suggestions are enumerated using
various algorithms. To increase the activity potency of generative approaches,
they have recently been coupled with molecular docking, a leading methodology
of structure-based drug design. In this review, we summarize progress since
docking-based generative models emerged. We propose a new taxonomy for these
methods and discuss their importance for the field of computer-aided drug
design. In addition, we discuss the most promising directions for further
development of generative protocols coupled with docking.
( 2
min )
We study convergence rates of loss and uncertainty-based active learning
algorithms under various assumptions. First, we provide a set of conditions
under which a convergence rate guarantee holds, and use this for linear
classifiers and linearly separable datasets to show convergence rate guarantees
for loss-based sampling and different loss functions. Second, we provide a
framework that allows us to derive convergence rate bounds for loss-based
sampling by deploying known convergence rate bounds for stochastic gradient
descent algorithms. Third, and last, we propose an active learning algorithm
that combines sampling of points and stochastic Polyak's step size. We show a
condition on the sampling that ensures a convergence rate guarantee for this
algorithm for smooth convex loss functions. Our numerical results demonstrate
efficiency of our proposed algorithm.
( 2
min )
Industrial robots are applied in a widening range of industries, but robot
programming mostly remains a task limited to programming experts. We propose a
natural language-based assistant for programming of advanced, industrial
robotic applications and investigate strategies for domain-specific fine-tuning
of foundation models with limited data and compute.
( 2
min )
We propose to train neural networks (NNs) using a novel variant of the
``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method
is based on a parallelizable additive domain decomposition approach applied to
the neural network's parameters. Built upon the TR framework, the APTS method
ensures global convergence towards a minimizer. Moreover, it eliminates the
need for computationally expensive hyper-parameter tuning, as the TR algorithm
automatically determines the step size in each iteration. We demonstrate the
capabilities, strengths, and limitations of the proposed APTS training method
by performing a series of numerical experiments. The presented numerical study
includes a comparison with widely used training methods such as SGD, Adam,
LBFGS, and the standard TR method.
( 2
min )
Transformer-based Large Language Models (LLMs) have become a fixture in
modern machine learning. Correspondingly, significant resources are allocated
towards research that aims to further advance this technology, typically
resulting in models of increasing size that are trained on increasing amounts
of data. This work, however, demonstrates the surprising result that it is
often possible to significantly improve the performance of LLMs by selectively
removing higher-order components of their weight matrices. This simple
intervention, which we call LAyer-SElective Rank reduction (LASER), can be done
on a model after training has completed, and requires no additional parameters
or data. We show extensive experiments demonstrating the generality of this
finding across language models and datasets, and provide in-depth analyses
offering insights into both when LASER is effective and the mechanism by which
it operates.
( 2
min )
InvertibleNetworks.jl is a Julia package designed for the scalable
implementation of normalizing flows, a method for density estimation and
sampling in high-dimensional distributions. This package excels in memory
efficiency by leveraging the inherent invertibility of normalizing flows, which
significantly reduces memory requirements during backpropagation compared to
existing normalizing flow packages that rely on automatic differentiation
frameworks. InvertibleNetworks.jl has been adapted for diverse applications,
including seismic imaging, medical imaging, and CO2 monitoring, demonstrating
its effectiveness in learning high-dimensional distributions.
( 2
min )
Utilizing task-invariant prior knowledge extracted from related tasks,
meta-learning is a principled framework that empowers learning a new task
especially when data records are limited. A fundamental challenge in
meta-learning is how to quickly "adapt" the extracted prior in order to train a
task-specific model within a few optimization steps. Existing approaches deal
with this challenge using a preconditioner that enhances convergence of the
per-task training process. Though effective in representing locally a quadratic
training loss, these simple linear preconditioners can hardly capture complex
loss geometries. The present contribution addresses this limitation by learning
a nonlinear mirror map, which induces a versatile distance metric to enable
capturing and optimizing a wide range of loss geometries, hence facilitating
the per-task training. Numerical tests on few-shot learning datasets
demonstrate the superior expressiveness and convergence of the advocated
approach.
( 2
min )
Many inference scenarios rely on extracting relevant information from known
data in order to make future predictions. When the underlying stochastic
process satisfies certain assumptions, there is a direct mapping between its
exact classical and quantum simulators, with the latter asymptotically using
less memory. Here we focus on studying whether such quantum advantage persists
when those assumptions are not satisfied, and the model is doomed to have
imperfect accuracy. By studying the trade-off between accuracy and memory
requirements, we show that quantum models can reach the same accuracy with less
memory, or alternatively, better accuracy with the same memory. Finally, we
discuss the implications of this result for learning tasks.
( 2
min )
Physical based simulations can be very time and computationally demanding
tasks. One way of accelerating these processes is by making use of data-driven
surrogate models that learn from existing simulations. Ensembling methods are
particularly relevant in this domain as their smoothness properties coincide
with the smoothness of physical phenomena. The drawback is that they can remain
costly. This research project focused on studying Packed-Ensembles that
generalize Deep Ensembles but remain faster to train. Several models have been
trained and compared in terms of multiple important metrics. PE(8,4,1) has been
identified as the clear winner in this particular task, beating down its Deep
Ensemble conterpart while accelerating the training time by 25%.
( 2
min )
The generation of cold atom clouds is a complex process which involves the
optimization of noisy data in high dimensional parameter spaces. Optimization
can be challenging both in and especially outside of the lab due to lack of
time, expertise, or access for lengthy manual optimization. In recent years, it
was demonstrated that machine learning offers a solution since it can optimize
high dimensional problems quickly, without knowledge of the experiment itself.
In this paper we present results showing the benchmarking of nine different
optimization techniques and implementations, alongside their ability to
optimize a Rubidium (Rb) cold atom experiment. The investigations are performed
on a 3D $^{87}$Rb molasses with 10 and 18 adjustable parameters, respectively,
where the atom number obtained by absorption imaging was chosen as the test
problem. We further compare the best performing optimizers under different
effective noise conditions by reducing the Signal-to-Noise ratio of the images
via adapting the atomic vapor pressure in the 2D+ MOT and the detection laser
frequency stability.
( 2
min )
Federated bilevel optimization (FBO) has shown great potential recently in
machine learning and edge computing due to the emerging nested optimization
structure in meta-learning, fine-tuning, hyperparameter tuning, etc. However,
existing FBO algorithms often involve complicated computations and require
multiple sub-loops per iteration, each of which contains a number of
communication rounds. In this paper, we propose a simple and flexible FBO
framework named SimFBO, which is easy to implement without sub-loops, and
includes a generalized server-side aggregation and update for improving
communication efficiency. We further propose System-level heterogeneity robust
FBO (ShroFBO) as a variant of SimFBO with stronger resilience to heterogeneous
local computation. We show that SimFBO and ShroFBO provably achieve a linear
convergence speedup with partial client participation and client sampling
without replacement, as well as improved sample and communication complexities.
Experiments demonstrate the effectiveness of the proposed methods over existing
FBO algorithms.
( 2
min )
In this paper, we revisit the bilevel optimization problem, in which the
upper-level objective function is generally nonconvex and the lower-level
objective function is strongly convex. Although this type of problem has been
studied extensively, it still remains an open question how to achieve an
${O}(\epsilon^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic
bilevel optimization without any second-order derivative computation. To fill
this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named
FdeHBO, which features a simple fully single-loop structure, a projection-aided
finite-difference Hessian/Jacobian-vector approximation, and momentum-based
updates. Theoretically, we show that FdeHBO requires ${O}(\epsilon^{-1.5})$
iterations (each using ${O}(1)$ samples and only first-order gradient
information) to find an $\epsilon$-accurate stationary point. As far as we
know, this is the first Hessian/Jacobian-free method with an
${O}(\epsilon^{-1.5})$ sample complexity for nonconvex-strongly-convex
stochastic bilevel optimization.
( 2
min )
We tackle the problem of sampling from intractable high-dimensional density
functions, a fundamental task that often appears in machine learning and
statistics. We extend recent sampling-based approaches that leverage controlled
stochastic processes to model approximate samples from these target densities.
The main drawback of these approaches is that the training objective requires
full trajectories to compute, resulting in sluggish credit assignment issues
due to use of entire trajectories and a learning signal present only at the
terminal time. In this work, we present Diffusion Generative Flow Samplers
(DGFS), a sampling-based framework where the learning process can be tractably
broken down into short partial trajectory segments, via parameterizing an
additional "flow function". Our method takes inspiration from the theory
developed for generative flow networks (GFlowNets), allowing us to make use of
intermediate learning signals. Through various challenging experiments, we
demonstrate that DGFS achieves more accurate estimates of the normalization
constant than closely-related prior methods.
( 2
min )
Uncertainty estimation is a key issue when considering the application of
deep neural network methods in science and engineering. In this work, we
introduce a novel algorithm that quantifies epistemic uncertainty via Monte
Carlo sampling from a tempered posterior distribution. It combines the well
established Metropolis Adjusted Langevin Algorithm (MALA) with momentum-based
optimization using Adam and leverages a prolate proposal distribution, to
efficiently draw from the posterior. We prove that the constructed chain admits
the Gibbs posterior as an invariant distribution and converges to this Gibbs
posterior in total variation distance. Numerical evaluations are postponed to a
first revision.
( 2
min )
Recently multi-armed bandit problem arises in many real-life scenarios where
arms must be sampled in batches, due to limited time the agent can wait for the
feedback. Such applications include biological experimentation and online
marketing. The problem is further complicated when the number of arms is large
and the number of batches is small. We consider pure exploration in a batched
multi-armed bandit problem. We introduce a general linear programming framework
that can incorporate objectives of different theoretical settings in best arm
identification. The linear program leads to a two-stage algorithm that can
achieve good theoretical properties. We demonstrate by numerical studies that
the algorithm also has good performance compared to certain UCB-type or
Thompson sampling methods.
( 2
min )
Convex clustering is a modern method with both hierarchical and $k$-means
clustering characteristics. Although convex clustering can capture complex
clustering structures hidden in data, the existing convex clustering algorithms
are not scalable to large data sets with sample sizes greater than several
thousands. Moreover, it is known that convex clustering sometimes fails to
produce a complete hierarchical clustering structure. This issue arises if
clusters split up or the minimum number of possible clusters is larger than the
desired number of clusters. In this paper, we propose convex clustering through
majorization-minimization (CCMM) -- an iterative algorithm that uses cluster
fusions and a highly efficient updating scheme derived using diagonal
majorization. Additionally, we explore different strategies to ensure that the
hierarchical clustering structure terminates in a single cluster. With a
current desktop computer, CCMM efficiently solves convex clustering problems
featuring over one million objects in seven-dimensional space, achieving a
solution time of 51 seconds on average.
( 2
min )
Pufferfish privacy is a flexible generalization of differential privacy that
allows to model arbitrary secrets and adversary's prior knowledge about the
data. Unfortunately, designing general and tractable Pufferfish mechanisms that
do not compromise utility is challenging. Furthermore, this framework does not
provide the composition guarantees needed for a direct use in iterative machine
learning algorithms. To mitigate these issues, we introduce a R\'enyi
divergence-based variant of Pufferfish and show that it allows us to extend the
applicability of the Pufferfish framework. We first generalize the Wasserstein
mechanism to cover a wide range of noise distributions and introduce several
ways to improve its utility. We also derive stronger guarantees against
out-of-distribution adversaries. Finally, as an alternative to composition, we
prove privacy amplification results for contractive noisy iterations and
showcase the first use of Pufferfish in private convex optimization. A common
ingredient underlying our results is the use and extension of shift reduction
lemmas.
( 2
min )
Large language model (LLM) training has surged in popularity over the last year with the release of several popular models such as Llama 2, Falcon, and Mistral. Customers are now pre-training and fine-tuning LLMs ranging from 1 billion to over 175 billion parameters to optimize model performance for applications across industries, from healthcare to finance […]
( 9
min )
Today, we are excited to announce that the Mixtral-8x7B large language model (LLM), developed by Mistral AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. The Mixtral-8x7B LLM is a pre-trained sparse mixture of expert model, based on a 7-billion parameter backbone with eight experts per feed-forward […]
( 11
min )
This blog is co-written with Josh Reini, Shayak Sen and Anupam Datta from TruEra Amazon SageMaker JumpStart provides a variety of pretrained foundation models such as Llama-2 and Mistal 7B that can be quickly deployed to an endpoint. These foundation models perform well with generative tasks, from crafting text and summaries, answering questions, to producing […]
( 12
min )
Generative AI agents are capable of producing human-like responses and engaging in natural language conversations by orchestrating a chain of calls to foundation models (FMs) and other augmenting tools based on user input. Instead of only fulfilling predefined intents through a static decision tree, agents are autonomous within the context of their suite of available […]
( 15
min )
As I completed this blog series, the European Union (EU) announced its AI Regulation Law. The European Union’s AI Regulation Act seeks to ensure AI’s ethical and safe deployment in the EU. Coming on the heels of the White House’s “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence,” we… Read More »Creating a More Fair, Just, and Prosperous Brave New World with AI Summary
The post Creating a More Fair, Just, and Prosperous Brave New World with AI Summary appeared first on Data Science Central.
( 21
min )
Master's students Irene Terpstra ’23 and Rujul Gandhi ’22 use language to design new integrated circuits and make it understandable to robots.
( 9
min )
AI saw unparalleled growth in 2023, reaching millions daily. This progress owes much to the extensive work of Microsoft researchers and collaborators. In this review, learn about the advances in 2023, which set the stage for further progress in 2024.
The post Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries appeared first on Microsoft Research.
( 17
min )
Quantization replaces floating point arithmetic with integer arithmetic in
deep neural network models, providing more efficient on-device inference with
less power and memory. In this work, we propose a framework for formally
verifying properties of quantized neural networks. Our baseline technique is
based on integer linear programming which guarantees both soundness and
completeness. We then show how efficiency can be improved by utilizing
gradient-based heuristic search methods and also bound-propagation techniques.
We evaluate our approach on perception networks quantized with PyTorch. Our
results show that we can verify quantized networks with better scalability and
efficiency than the previous state of the art.
( 2
min )
Deep generative models, such as diffusion models, GANs, and IMLE, have shown
impressive capability in tackling inverse problems. However, the validity of
model-generated solutions w.r.t. the forward problem and the reliability of
associated uncertainty estimates remain understudied. This study evaluates
recent diffusion-based, GAN-based, and IMLE-based methods on three inverse
problems, i.e., $16\times$ super-resolution, colourization, and image
decompression. We assess the validity of these models' outputs as solutions to
the inverse problems and conduct a thorough analysis of the reliability of the
models' estimates of uncertainty over the solution. Overall, we find that the
IMLE-based CHIMLE method outperforms other methods in terms of producing valid
solutions and reliable uncertainty estimates.
( 2
min )
The selection of the assumed effect size (AES) critically determines the
duration of an experiment, and hence its accuracy and efficiency.
Traditionally, experimenters determine AES based on domain knowledge. However,
this method becomes impractical for online experimentation services managing
numerous experiments, and a more automated approach is hence of great demand.
We initiate the study of data-driven AES selection in for online
experimentation services by introducing two solutions. The first employs a
three-layer Gaussian Mixture Model considering the heteroskedasticity across
experiments, and it seeks to estimate the true expected effect size among
positive experiments. The second method, grounded in utility theory, aims to
determine the optimal effect size by striking a balance between the
experiment's cost and the precision of decision-making. Through comparisons
with baseline methods using both simulated and real data, we showcase the
superior performance of the proposed approaches.
( 2
min )
Fusing measurements from multiple, heterogeneous, partial sources, observing
a common object or process, poses challenges due to the increasing availability
of numbers and types of sensors. In this work we propose, implement and
validate an end-to-end computational pipeline in the form of a
multiple-auto-encoder neural network architecture for this task. The inputs to
the pipeline are several sets of partial observations, and the result is a
globally consistent latent space, harmonizing (rigidifying, fusing) all
measurements. The key enabler is the availability of multiple slightly
perturbed measurements of each instance:, local measurement, "bursts", that
allows us to estimate the local distortion induced by each instrument. We
demonstrate the approach in a sequence of examples, starting with simple
two-dimensional data sets and proceeding to a Wi-Fi localization problem and to
the solution of a "dynamical puzzle" arising in spatio-temporal observations of
the solutions of Partial Differential Equations.
( 2
min )
We introduce Efficient Title Reranker via Broadcasting Query Encoder, a novel
title reranking technique to achieve efficient title reranking 20x-40x faster
than vanilla passage reranker. However, one of the challenges with the training
of Efficient Title Reranker is the instability. Analyzing the issue, we found
some very difficult ground truths might act as noisy labels causing accuracy to
drop as well as some extreme values in model probability output causing nan. To
address these issues, we introduce the Sigmoid Trick, a novel technique that
reduces the gradient update of both cases resulting in better retrieval
efficacy. Experiments showed the effectiveness of ETR and sigmoid trick as we
achieved four state-of-the-art positions on the kilt knowledge benchmark.
( 2
min )
We present a novel approach to non-convex optimization with certificates,
which handles smooth functions on the hypercube or on the torus. Unlike
traditional methods that rely on algebraic properties, our algorithm exploits
the regularity of the target function intrinsic in the decay of its Fourier
spectrum. By defining a tractable family of models, we allow at the same time
to obtain precise certificates and to leverage the advanced and powerful
computational techniques developed to optimize neural networks. In this way the
scalability of our approach is naturally enhanced by parallel computing with
GPUs. Our approach, when applied to the case of polynomials of moderate
dimensions but with thousands of coefficients, outperforms the state-of-the-art
optimization methods with certificates, as the ones based on Lasserre's
hierarchy, addressing problems intractable for the competitors.
( 2
min )
Neural networks are powerful tools in various applications, and quantifying
their uncertainty is crucial for reliable decision-making. In the deep learning
field, the uncertainties are usually categorized into aleatoric (data) and
epistemic (model) uncertainty. In this paper, we point out that the existing
popular variance attenuation method highly overestimates aleatoric uncertainty.
To address this issue, we propose a new estimation method by actively
de-noising the observed data. By conducting a broad range of experiments, we
demonstrate that our proposed approach provides a much closer approximation to
the actual data uncertainty than the standard method.
( 2
min )
Generative Adversarial Networks (GANs) have become a ubiquitous technology
for data generation, with their prowess in image generation being
well-established. However, their application in generating tabular data has
been less than ideal. Furthermore, attempting to incorporate differential
privacy technology into these frameworks has often resulted in a degradation of
data utility. To tackle these challenges, this paper introduces DP-SACTGAN, a
novel Conditional Generative Adversarial Network (CGAN) framework for
differentially private tabular data generation, aiming to surmount these
obstacles. Experimental findings demonstrate that DP-SACTGAN not only
accurately models the distribution of the original data but also effectively
satisfies the requirements of differential privacy.
( 2
min )
Measurement-based quantum computation (MBQC) is a paradigm for quantum
computation where computation is driven by local measurements on a suitably
entangled resource state. In this work we show that MBQC is related to a model
of quantum computation based on Clifford quantum cellular automata (CQCA).
Specifically, we show that certain MBQCs can be directly constructed from CQCAs
which yields a simple and intuitive circuit model representation of MBQC in
terms of quantum computation based on CQCA. We apply this description to
construct various MBQC-based Ans\"atze for parameterized quantum circuits,
demonstrating that the different Ans\"atze may lead to significantly different
performances on different learning tasks. In this way, MBQC yields a family of
Hardware-efficient Ans\"atze that may be adapted to specific problem settings
and is particularly well suited for architectures with translationally
invariant gates such as neutral atoms.
( 2
min )
External control arms (ECA) can inform the early clinical development of
experimental drugs and provide efficacy evidence for regulatory approval in
non-randomized settings. However, the main challenge of implementing ECA lies
in accessing real-world data or historical clinical trials. Indeed, data
sharing is often not feasible due to privacy considerations related to data
leaving the original collection centers, along with pharmaceutical companies'
competitive motives. In this paper, we leverage a privacy-enhancing technology
called federated learning (FL) to remove some of the barriers to data sharing.
We introduce a federated learning inverse probability of treatment weighted
(IPTW) method for time-to-event outcomes called FedECA which eases the
implementation of ECA by limiting patients' data exposure. We show with
extensive experiments that FedECA outperforms its closest competitor,
matching-adjusted indirect comparison (MAIC), in terms of statistical power and
ability to balance the treatment and control groups. To encourage the use of
such methods, we publicly release our code which relies on Substra, an
open-source FL software with proven experience in privacy-sensitive contexts.
( 3
min )
Text segmentation, the task of dividing a document into sections, is often a
prerequisite for performing additional natural language processing tasks.
Existing text segmentation methods have typically been developed and tested
using clean, narrative-style text with segments containing distinct topics.
Here we consider a challenging text segmentation task: dividing newspaper
marriage announcement lists into units of one announcement each. In many cases
the information is not structured into sentences, and adjacent segments are not
topically distinct from each other. In addition, the text of the announcements,
which is derived from images of historical newspapers via optical character
recognition, contains many typographical errors. As a result, these
announcements are not amenable to segmentation with existing techniques. We
present a novel deep learning-based model for segmenting such text and show
that it significantly outperforms an existing state-of-the-art method on our
task.
( 2
min )
We propose a novel machine learning method for sampling from the
high-dimensional probability distributions of Lattice Field Theories, which is
based on a single neural ODE layer and incorporates the full symmetries of the
problem. We test our model on the $\phi^4$ theory, showing that it
systematically outperforms previously proposed flow-based methods in sampling
efficiency, and the improvement is especially pronounced for larger lattices.
Furthermore, we demonstrate that our model can learn a continuous family of
theories at once, and the results of learning can be transferred to larger
lattices. Such generalizations further accentuate the advantages of machine
learning methods.
( 2
min )
Nowadays neural-network-based image- and video-quality metrics show better
performance compared to traditional methods. However, they also became more
vulnerable to adversarial attacks that increase metrics' scores without
improving visual quality. The existing benchmarks of quality metrics compare
their performance in terms of correlation with subjective quality and
calculation time. However, the adversarial robustness of image-quality metrics
is also an area worth researching. In this paper, we analyse modern metrics'
robustness to different adversarial attacks. We adopted adversarial attacks
from computer vision tasks and compared attacks' efficiency against 15
no-reference image/video-quality metrics. Some metrics showed high resistance
to adversarial attacks which makes their usage in benchmarks safer than
vulnerable metrics. The benchmark accepts new metrics submissions for
researchers who want to make their metrics more robust to attacks or to find
such metrics for their needs. Try our benchmark using pip install
robustness-benchmark.
( 2
min )
We propose to learn non-convex regularizers with a prescribed upper bound on
their weak-convexity modulus. Such regularizers give rise to variational
denoisers that minimize a convex energy. They rely on few parameters (less than
15,000) and offer a signal-processing interpretation as they mimic handcrafted
sparsity-promoting regularizers. Through numerical experiments, we show that
such denoisers outperform convex-regularization methods as well as the popular
BM3D denoiser. Additionally, the learned regularizer can be deployed to solve
inverse problems with iterative schemes that provably converge. For both CT and
MRI reconstruction, the regularizer generalizes well and offers an excellent
tradeoff between performance, number of parameters, guarantees, and
interpretability when compared to other data-driven approaches.
( 2
min )
Recent studies show that deep reinforcement learning (DRL) agents tend to
overfit to the task on which they were trained and fail to adapt to minor
environment changes. To expedite learning when transferring to unseen tasks, we
propose a novel approach to representing the current task using reward machines
(RMs), state machine abstractions that induce subtasks based on the current
task's rewards and dynamics. Our method provides agents with symbolic
representations of optimal transitions from their current abstract state and
rewards them for achieving these transitions. These representations are shared
across tasks, allowing agents to exploit knowledge of previously encountered
symbols and transitions, thus enhancing transfer. Empirical results show that
our representations improve sample efficiency and few-shot transfer in a
variety of domains.
( 2
min )
We propose a simple and general framework for nonparametric estimation of
heterogeneous treatment effects under fairness constraints. Under standard
regularity conditions, we show that the resulting estimators possess the double
robustness property. We use this framework to characterize the trade-off
between fairness and the maximum welfare achievable by the optimal policy. We
evaluate the methods in a simulation study and illustrate them in a real-world
case study.
( 2
min )
The recent popularity of text-to-image diffusion models (DM) can largely be
attributed to the intuitive interface they provide to users. The intended
generation can be expressed in natural language, with the model producing
faithful interpretations of text prompts. However, expressing complex or
nuanced ideas in text alone can be difficult. To ease image generation, we
propose MultiFusion that allows one to express complex and nuanced concepts
with arbitrarily interleaved inputs of multiple modalities and languages.
MutliFusion leverages pre-trained models and aligns them for integration into a
cohesive system, thereby avoiding the need for extensive training from scratch.
Our experimental results demonstrate the efficient transfer of capabilities
from individual modules to the downstream model. Specifically, the fusion of
all independent components allows the image generation module to utilize
multilingual, interleaved multimodal inputs despite being trained solely on
monomodal data in a single language.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
Time Series Classification and Extrinsic Regression are important and
challenging machine learning tasks. Deep learning has revolutionized natural
language processing and computer vision and holds great promise in other fields
such as time series analysis where the relevant features must often be
abstracted from the raw data but are not known a priori. This paper surveys the
current state of the art in the fast-moving field of deep learning for time
series classification and extrinsic regression. We review different network
architectures and training methods used for these tasks and discuss the
challenges and opportunities when applying deep learning to time series data.
We also summarize two critical applications of time series classification and
extrinsic regression, human activity recognition and satellite earth
observation.
( 2
min )
To mitigate global warming, greenhouse gas sources need to be resolved at a
high spatial resolution and monitored in time to ensure the reduction and
ultimately elimination of the pollution source. However, the complexity of
computation in resolving high-resolution wind fields left the simulations
impractical to test different time lengths and model configurations. This study
presents a preliminary development of a physics-informed super-resolution (SR)
generative adversarial network (GAN) that super-resolves the three-dimensional
(3D) low-resolution wind fields by upscaling x9 times. We develop a pixel-wise
self-attention (PWA) module that learns 3D weather dynamics via a
self-attention computation followed by a 2D convolution. We also employ a loss
term that regularizes the self-attention map during pretraining, capturing the
vertical convection process from input wind data. The new PWA SR-GAN shows the
high-fidelity super-resolved 3D wind data, learns a wind structure at the
high-frequency domain, and reduces the computational cost of a high-resolution
wind simulation by x89.7 times.
( 2
min )
This paper introduces Structured Noise Space GAN (SNS-GAN), a novel approach
in the field of generative modeling specifically tailored for class-conditional
generation in both image and time series data. It addresses the challenge of
effectively integrating class labels into generative models without requiring
structural modifications to the network. The SNS-GAN method embeds class
conditions within the generator's noise space, simplifying the training process
and enhancing model versatility. The model's efficacy is demonstrated through
qualitative validations in the image domain and superior performance in time
series generation compared to baseline models. This research opens new avenues
for the application of GANs in various domains, including but not limited to
time series and image data generation.
( 2
min )
In this work, we study the problem of stability of Graph Convolutional Neural
Networks (GCNs) under random small perturbations in the underlying graph
topology, i.e. under a limited number of insertions or deletions of edges. We
derive a novel bound on the expected difference between the outputs of
unperturbed and perturbed GCNs. The proposed bound explicitly depends on the
magnitude of the perturbation of the eigenpairs of the Laplacian matrix, and
the perturbation explicitly depends on which edges are inserted or deleted.
Then, we provide a quantitative characterization of the effect of perturbing
specific edges on the stability of the network. We leverage tools from small
perturbation analysis to express the bounds in closed, albeit approximate,
form, in order to enhance interpretability of the results, without the need to
compute any perturbed shift operator. Finally, we numerically evaluate the
effectiveness of the proposed bound.
( 2
min )
We propose an energy-efficient equalizer for IM/DD systems based on spiking
neural networks. We optimize a neural spike encoding that boosts the
equalizer's performance while decreasing energy consumption.
( 2
min )
Deep reinforcement learning has advanced greatly and applied in many areas.
In this paper, we explore the vulnerability of deep reinforcement learning by
proposing a novel generative model for creating effective adversarial examples
to attack the agent. Our proposed model can achieve both targeted attacks and
untargeted attacks. Considering the specificity of deep reinforcement learning,
we propose the action consistency ratio as a measure of stealthiness, and a new
measurement index of effectiveness and stealthiness. Experiment results show
that our method can ensure the effectiveness and stealthiness of attack
compared with other algorithms. Moreover, our methods are considerably faster
and thus can achieve rapid and efficient verification of the vulnerability of
deep reinforcement learning.
( 2
min )
Motivated by the interpretability question in ML models as a crucial element
for the successful deployment of AI systems, this paper focuses on rule
extraction as a means for neural networks interpretability. Through a
systematic literature review, different approaches for extracting rules from
feedforward neural networks, an important block in deep learning models, are
identified and explored. The findings reveal a range of methods developed for
over two decades, mostly suitable for shallow neural networks, with recent
developments to meet deep learning models' challenges. Rules offer a
transparent and intuitive means of explaining neural networks, making this
study a comprehensive introduction for researchers interested in the field.
While the study specifically addresses feedforward networks with supervised
learning and crisp rules, future work can extend to other network types,
machine learning methods, and fuzzy rule extraction.
( 2
min )
Exponential families are statistical models which are the workhorses in
statistics, information theory, and machine learning. An exponential family can
either be normalized subtractively by its cumulant function or equivalently
normalized divisively by its partition function. Both subtractive and divisive
normalizers are strictly convex and smooth functions inducing pairs of Bregman
and Jensen divergences. It is well-known that skewed Bhattacharryya distances
between probability densities of an exponential family amounts to skewed Jensen
divergences induced by the cumulant function between their corresponding
natural parameters, and in limit cases that the sided Kullback-Leibler
divergences amount to reverse-sided Bregman divergences. In this note, we first
show that the $\alpha$-divergences between unnormalized densities of an
exponential family amounts scaled $\alpha$-skewed Jensen divergences induced by
the partition function. We then show how comparative convexity with respect to
a pair of quasi-arithmetic means allows to deform convex functions and define
dually flat spaces with corresponding divergences when ordinary convexity is
preserved.
( 2
min )
This paper studies bandit problems where an agent has access to offline data
that might be utilized to potentially improve the estimation of each arm's
reward distribution. A major obstacle in this setting is the existence of
compound biases from the observational data. Ignoring these biases and blindly
fitting a model with the biased data could even negatively affect the online
learning phase. In this work, we formulate this problem from a causal
perspective. First, we categorize the biases into confounding bias and
selection bias based on the causal structure they imply. Next, we extract the
causal bound for each arm that is robust towards compound biases from biased
observational data. The derived bounds contain the ground truth mean reward and
can effectively guide the bandit agent to learn a nearly-optimal decision
policy. We also conduct regret analysis in both contextual and non-contextual
bandit settings and show that prior causal bounds could help consistently
reduce the asymptotic regret.
( 2
min )
Graph clustering is a fundamental and challenging task in the field of graph
mining where the objective is to group the nodes into clusters taking into
consideration the topology of the graph. It has several applications in diverse
domains spanning social network analysis, recommender systems, computer vision,
and bioinformatics. In this work, we propose a novel method, DGCluster, which
primarily optimizes the modularity objective using graph neural networks and
scales linearly with the graph size. Our method does not require the number of
clusters to be specified as a part of the input and can also leverage the
availability of auxiliary node level information. We extensively test DGCluster
on several real-world datasets of varying sizes, across multiple popular
cluster quality metrics. Our approach consistently outperforms the
state-of-the-art methods, demonstrating significant performance gains in almost
all settings.
( 2
min )
Designing studies that apply causal discovery requires navigating many
researcher degrees of freedom. This complexity is exacerbated when the study
involves fMRI data. In this paper we (i) describe nine challenges that occur
when applying causal discovery to fMRI data, (ii) discuss the space of
decisions that need to be made, (iii) review how a recent case study made those
decisions, (iv) and identify existing gaps that could potentially be solved by
the development of new methods. Overall, causal discovery is a promising
approach for analyzing fMRI data, and multiple successful applications have
indicated that it is superior to traditional fMRI functional connectivity
methods, but current causal discovery methods for fMRI leave room for
improvement.
( 2
min )
Multi-fidelity Bayesian Optimisation (MFBO) has been shown to generally
converge faster than single-fidelity Bayesian Optimisation (SFBO) (Poloczek et
al. (2017)). Inspired by recent benchmark papers, we are investigating the
long-run behaviour of MFBO, based on observations in the literature that it
might under-perform in certain scenarios (Mikkola et al. (2023), Eggensperger
et al. (2021)). An under-performance of MBFO in the long-run could
significantly undermine its application to many research tasks, especially when
we are not able to identify when the under-performance begins. We create a
simple benchmark study, showcase empirical results and discuss scenarios and
possible reasons of under-performance.
( 2
min )
This work presents the PORTALS framework, which leverages surrogate modeling
and optimization techniques to enable the prediction of core plasma profiles
and performance with nonlinear gyrokinetic simulations at significantly reduced
cost, with no loss of accuracy. The efficiency of PORTALS is benchmarked
against standard methods, and its full potential is demonstrated on a unique,
simultaneous 5-channel (electron temperature, ion temperature, electron
density, impurity density and angular rotation) prediction of steady-state
profiles in a DIII-D ITER Similar Shape plasma with GPU-accelerated, nonlinear
CGYRO. This paper also provides general guidelines for accurate performance
predictions in burning plasmas and the impact of transport modeling in fusion
pilot plants studies.
( 2
min )
Fairness AI aims to detect and alleviate bias across the entire AI
development life cycle, encompassing data curation, modeling, evaluation, and
deployment-a pivotal aspect of ethical AI implementation. Addressing data bias,
particularly concerning sensitive attributes like gender and race, reweighting
samples proves efficient for fairness AI. This paper contributes a systematic
examination of reweighting samples for traditional machine learning (ML)
models, employing five models for binary classification on the Adult Income and
COMPUS datasets with various protected attributes. The study evaluates
prediction results using five fairness metrics, uncovering the nuanced and
model-specific nature of reweighting sample effectiveness in achieving fairness
in traditional ML models, as well as revealing the complexity of bias dynamics.
( 2
min )
Motivated by recent work on lifelong learning applications for language
models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused
on code changes. Our contribution addresses a notable research gap marked by
the absence of a long-term temporal dimension in existing code change datasets,
limiting their suitability in lifelong learning scenarios. In contrast, our
dataset aims to comprehensively capture code changes across the entire release
history of open-source software repositories. In this work, we introduce an
initial version of CodeLL, comprising 71 machine-learning-based projects mined
from Software Heritage. This dataset enables the extraction and in-depth
analysis of code changes spanning 2,483 releases at both the method and API
levels. CodeLL enables researchers studying the behaviour of LMs in lifelong
fine-tuning settings for learning code changes. Additionally, the dataset can
help studying data distribution shifts within software repositories and the
evolution of API usages over time.
( 2
min )
This paper explores the feasibility and performance of on-device large
language model (LLM) inference on various Apple iPhone models. Amidst the rapid
evolution of generative AI, on-device LLMs offer solutions to privacy,
security, and connectivity challenges inherent in cloud-based models.
Leveraging existing literature on running multi-billion parameter LLMs on
resource-limited devices, our study examines the thermal effects and
interaction speeds of a high-performing LLM across different smartphone
generations. We present real-world performance results, providing insights into
on-device inference capabilities.
( 2
min )
Neural construction models have shown promising performance for Vehicle
Routing Problems (VRPs) by adopting either the Autoregressive (AR) or
Non-Autoregressive (NAR) learning approach. While AR models produce
high-quality solutions, they generally have a high inference latency due to
their sequential generation nature. Conversely, NAR models generate solutions
in parallel with a low inference latency but generally exhibit inferior
performance. In this paper, we propose a generic Guided Non-Autoregressive
Knowledge Distillation (GNARKD) method to obtain high-performance NAR models
having a low inference latency. GNARKD removes the constraint of sequential
generation in AR models while preserving the learned pivotal components in the
network architecture to obtain the corresponding NAR models through knowledge
distillation. We evaluate GNARKD by applying it to three widely adopted AR
models to obtain NAR VRP solvers for both synthesized and real-world instances.
The experimental results demonstrate that GNARKD significantly reduces the
inference time (4-5 times faster) with acceptable performance drop (2-3\%). To
the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP
solvers from AR ones through knowledge distillation.
( 2
min )
We present a study on the integration of Large Language Models (LLMs) in
tabular data classification, emphasizing an efficient framework. Building upon
existing work done in TabLLM (arXiv:2210.10723), we introduce three novel
serialization techniques, including the standout LaTeX serialization method.
This method significantly boosts the performance of LLMs in processing
domain-specific datasets, Our method stands out for its memory efficiency and
ability to fully utilize complex data structures. Through extensive
experimentation, including various serialization approaches like feature
combination and importance, we demonstrate our work's superiority in accuracy
and efficiency over traditional models.
( 2
min )
Drivers can sustain serious injuries in traffic accidents. In this study,
traffic crashes on Florida's Interstate-95 from 2016 to 2021 were gathered, and
several classification methods were used to estimate the severity of driver
injuries. In the feature selection method, logistic regression was applied. To
compare model performances, various model assessment matrices such as accuracy,
recall, and area under curve (AUC) were developed. The Adaboost algorithm
outperformed the others in terms of recall and AUC. SHAP values were also
generated to explain the classification model's results. This analytical study
can be used to examine factors that contribute to the severity of driver
injuries in crashes.
( 2
min )
This paper presents a novel approach for analysing EEG data from drivers in a
simulated driving test. We focused on the Hurst exponent, Shannon entropy, and
fractal dimension as markers of the nonlinear dynamics of the brain. The
results show significant trends: Shannon Entropy and Fractal Dimension exhibit
variations during driving condition transitions, whereas the Hurst exponent
reflects memory retention portraying learning patterns. These findings suggest
that the tools of Non-linear Dynamical (NLD) Theory as indicators of cognitive
state and driving memory changes for assessing driver performance and advancing
the understanding of non-linear dynamics of human cognition in the context of
driving and beyond. Our study reveals the potential of NLD tools to elucidate
brain state and system variances, enabling their integration into current Deep
Learning and Machine Learning models. This integration can extend beyond
driving applications and be harnessed for cognitive learning, thereby improving
overall productivity and accuracy levels.
( 2
min )
Focusing on stochastic programming (SP) with covariate information, this
paper proposes an empirical risk minimization (ERM) method embedded within a
nonconvex piecewise affine decision rule (PADR), which aims to learn the direct
mapping from features to optimal decisions. We establish the nonasymptotic
consistency result of our PADR-based ERM model for unconstrained problems and
asymptotic consistency result for constrained ones. To solve the nonconvex and
nondifferentiable ERM problem, we develop an enhanced stochastic
majorization-minimization algorithm and establish the asymptotic convergence to
(composite strong) directional stationarity along with complexity analysis. We
show that the proposed PADR-based ERM method applies to a broad class of
nonconvex SP problems with theoretical consistency guarantees and computational
tractability. Our numerical study demonstrates the superior performance of
PADR-based ERM methods compared to state-of-the-art approaches under various
settings, with significantly lower costs, less computation time, and robustness
to feature dimensions and nonlinearity of the underlying dependency.
( 2
min )
We propose a simple and general framework for nonparametric estimation of
heterogeneous treatment effects under fairness constraints. Under standard
regularity conditions, we show that the resulting estimators possess the double
robustness property. We use this framework to characterize the trade-off
between fairness and the maximum welfare achievable by the optimal policy. We
evaluate the methods in a simulation study and illustrate them in a real-world
case study.
( 2
min )
Researchers are increasingly turning to machine learning (ML) algorithms to
investigate causal heterogeneity in randomized experiments. Despite their
promise, ML algorithms may fail to accurately ascertain heterogeneous treatment
effects under practical settings with many covariates and small sample size. In
addition, the quantification of estimation uncertainty remains a challenge. We
develop a general approach to statistical inference for heterogeneous treatment
effects discovered by a generic ML algorithm. We apply the Neyman's repeated
sampling framework to a common setting, in which researchers use an ML
algorithm to estimate the conditional average treatment effect and then divide
the sample into several groups based on the magnitude of the estimated effects.
We show how to estimate the average treatment effect within each of these
groups, and construct a valid confidence interval. In addition, we develop
nonparametric tests of treatment effect homogeneity across groups, and
rank-consistency of within-group average treatment effects. The validity of our
methodology does not rely on the properties of ML algorithms because it is
solely based on the randomization of treatment assignment and random sampling
of units. Finally, we generalize our methodology to the cross-fitting procedure
by accounting for the additional uncertainty induced by the random splitting of
data.
( 3
min )
Recent advances in practical quantum computing have led to a variety of
cloud-based quantum computing platforms that allow researchers to evaluate
their algorithms on noisy intermediate-scale quantum (NISQ) devices. A common
property of quantum computers is that they can exhibit instances of true
randomness as opposed to pseudo-randomness obtained from classical systems.
Investigating the effects of such true quantum randomness in the context of
machine learning is appealing, and recent results vaguely suggest that benefits
can indeed be achieved from the use of quantum random numbers. To shed some
more light on this topic, we empirically study the effects of hardware-biased
quantum random numbers on the initialization of artificial neural network
weights in numerical experiments. We find no statistically significant
difference in comparison with unbiased quantum random numbers as well as biased
and unbiased random numbers from a classical pseudo-random number generator.
The quantum random numbers for our experiments are obtained from real quantum
hardware.
( 2
min )
The selection of the assumed effect size (AES) critically determines the
duration of an experiment, and hence its accuracy and efficiency.
Traditionally, experimenters determine AES based on domain knowledge. However,
this method becomes impractical for online experimentation services managing
numerous experiments, and a more automated approach is hence of great demand.
We initiate the study of data-driven AES selection in for online
experimentation services by introducing two solutions. The first employs a
three-layer Gaussian Mixture Model considering the heteroskedasticity across
experiments, and it seeks to estimate the true expected effect size among
positive experiments. The second method, grounded in utility theory, aims to
determine the optimal effect size by striking a balance between the
experiment's cost and the precision of decision-making. Through comparisons
with baseline methods using both simulated and real data, we showcase the
superior performance of the proposed approaches.
( 2
min )
Measurement-based quantum computation (MBQC) is a paradigm for quantum
computation where computation is driven by local measurements on a suitably
entangled resource state. In this work we show that MBQC is related to a model
of quantum computation based on Clifford quantum cellular automata (CQCA).
Specifically, we show that certain MBQCs can be directly constructed from CQCAs
which yields a simple and intuitive circuit model representation of MBQC in
terms of quantum computation based on CQCA. We apply this description to
construct various MBQC-based Ans\"atze for parameterized quantum circuits,
demonstrating that the different Ans\"atze may lead to significantly different
performances on different learning tasks. In this way, MBQC yields a family of
Hardware-efficient Ans\"atze that may be adapted to specific problem settings
and is particularly well suited for architectures with translationally
invariant gates such as neutral atoms.
( 2
min )
Designing studies that apply causal discovery requires navigating many
researcher degrees of freedom. This complexity is exacerbated when the study
involves fMRI data. In this paper we (i) describe nine challenges that occur
when applying causal discovery to fMRI data, (ii) discuss the space of
decisions that need to be made, (iii) review how a recent case study made those
decisions, (iv) and identify existing gaps that could potentially be solved by
the development of new methods. Overall, causal discovery is a promising
approach for analyzing fMRI data, and multiple successful applications have
indicated that it is superior to traditional fMRI functional connectivity
methods, but current causal discovery methods for fMRI leave room for
improvement.
( 2
min )
This paper introduces Structured Noise Space GAN (SNS-GAN), a novel approach
in the field of generative modeling specifically tailored for class-conditional
generation in both image and time series data. It addresses the challenge of
effectively integrating class labels into generative models without requiring
structural modifications to the network. The SNS-GAN method embeds class
conditions within the generator's noise space, simplifying the training process
and enhancing model versatility. The model's efficacy is demonstrated through
qualitative validations in the image domain and superior performance in time
series generation compared to baseline models. This research opens new avenues
for the application of GANs in various domains, including but not limited to
time series and image data generation.
( 2
min )
This paper studies bandit problems where an agent has access to offline data
that might be utilized to potentially improve the estimation of each arm's
reward distribution. A major obstacle in this setting is the existence of
compound biases from the observational data. Ignoring these biases and blindly
fitting a model with the biased data could even negatively affect the online
learning phase. In this work, we formulate this problem from a causal
perspective. First, we categorize the biases into confounding bias and
selection bias based on the causal structure they imply. Next, we extract the
causal bound for each arm that is robust towards compound biases from biased
observational data. The derived bounds contain the ground truth mean reward and
can effectively guide the bandit agent to learn a nearly-optimal decision
policy. We also conduct regret analysis in both contextual and non-contextual
bandit settings and show that prior causal bounds could help consistently
reduce the asymptotic regret.
( 2
min )
We establish explicit dynamics for neural networks whose training objective
has a regularising term that constrains the parameters to remain close to their
initial value. This keeps the network in a lazy training regime, where the
dynamics can be linearised around the initialisation. The standard neural
tangent kernel (NTK) governs the evolution during the training in the
infinite-width limit, although the regularisation yields an additional term
appears in the differential equation describing the dynamics. This setting
provides an appropriate framework to study the evolution of wide networks
trained to optimise generalisation objectives such as PAC-Bayes bounds, and
hence potentially contribute to a deeper theoretical understanding of such
networks.
( 2
min )
Multi-fidelity Bayesian Optimisation (MFBO) has been shown to generally
converge faster than single-fidelity Bayesian Optimisation (SFBO) (Poloczek et
al. (2017)). Inspired by recent benchmark papers, we are investigating the
long-run behaviour of MFBO, based on observations in the literature that it
might under-perform in certain scenarios (Mikkola et al. (2023), Eggensperger
et al. (2021)). An under-performance of MBFO in the long-run could
significantly undermine its application to many research tasks, especially when
we are not able to identify when the under-performance begins. We create a
simple benchmark study, showcase empirical results and discuss scenarios and
possible reasons of under-performance.
( 2
min )
Great customer experience provides a competitive edge and helps create brand differentiation. As per the Forrester report, The State Of Customer Obsession, 2022, being customer-first can make a sizable impact on an organization’s balance sheet, as organizations embracing this methodology are surpassing their peers in revenue growth. Despite contact centers being under constant pressure to […]
( 10
min )
I asked DALL-E3 (via chatgpt) for "a simple Christmas nativity scene with each element clearly labeled in large capital letters for a child who is learning to read."
"Please generate a simple Christmas nativity scene with each element clearly labeled in large capital letters for a child
( 3
min )
AI Weirdness: the strange side of machine learning
( 2
min )
AI made a splash this year — from Wall Street to the U.S. Congress — driven by a wave of developers aiming to make the world better. Here’s a look at AI in 2023 across agriculture, natural disasters, medicine and other areas worthy of a cocktail party conversation. This AI Is on Fire California has Read article >
( 7
min )
Time to gear up, hunters — Capcom’s Monster Hunter: World joins the GeForce NOW library, bringing members the ultimate hunting experience on any device. It’s all part of an adventurous week, with nearly a dozen new games joining the cloud gaming service. A Whole New World Join the Fifth Fleet on an epic adventure to Read article >
( 6
min )
We propose a Reinforcement-Learning-based system that would automatically
prescribe a hypothetical patient medications that may help the patient with
their mental-health-related speech disfluency, and adjust the medication and
the dosages in response to data from the patient. We demonstrate the components
of the system: a module that detects and evaluates speech disfluency on a large
dataset we built, and a Reinforcement Learning algorithm that automatically
finds good combinations of medications. To support the two modules, we collect
data on the effect of psychiatric medications for speech disfluency from the
literature, and build a plausible patient simulation system. We demonstrate
that the Reinforcement Learning system is, under some circumstances, able to
converge to a good medication regime. We collect and label a dataset of people
with possible speech disfluency and demonstrate our methods using that dataset.
Our work is a proof of concept: we show that there is promise in the idea of
using automatic data collection to address disfluency.
( 2
min )
We present XLand-MiniGrid, a suite of tools and grid-world environments for
meta-reinforcement learning research inspired by the diversity and depth of
XLand and the simplicity and minimalism of MiniGrid. XLand-Minigrid is written
in JAX, designed to be highly scalable, and can potentially run on GPU or TPU
accelerators, democratizing large-scale experimentation with limited resources.
To demonstrate the generality of our library, we have implemented some
well-known single-task environments as well as new meta-learning environments
capable of generating $10^8$ distinct tasks. We have empirically shown that the
proposed environments can scale up to $2^{13}$ parallel instances on the GPU,
reaching tens of millions of steps per second.
( 2
min )
Reinforcement learning (RL) often struggles to accomplish a sparse-reward
long-horizon task in a complex environment. Goal-conditioned reinforcement
learning (GCRL) has been employed to tackle this difficult problem via a
curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is
essential for the agent to ultimately find the pathway to the desired goal. How
to explore novel sub-goals efficiently is one of the most challenging issues in
GCRL. Several goal exploration methods have been proposed to address this issue
but still struggle to find the desired goals efficiently. In this paper, we
propose a novel learning objective by optimizing the entropy of both achieved
and new goals to be explored for more efficient goal exploration in sub-goal
selection based GCRL. To optimize this objective, we first explore and exploit
the frequently occurring goal-transition patterns mined in the environments
similar to the current task to compose skills via skill learning. Then, the
pretrained skills are applied in goal exploration. Evaluation on a variety of
spare-reward long-horizon benchmark tasks suggests that incorporating our
method into several state-of-the-art GCRL baselines significantly boosts their
exploration efficiency while improving or maintaining their performance. The
source code is available at: https://github.com/GEAPS/GEAPS.
( 3
min )
We train a language model (LM) to robustly answer multistep questions by
generating and answering sub-questions. We propose Chain-of-Questions, a
framework that trains a model to generate sub-questions and sub-answers one at
a time by leveraging human annotated question decomposition meaning
representation (QDMR). The key technical challenge is that QDMR only contains
sub-questions but not answers to those sub-questions, so we treat sub-answers
as latent variables and optimize them using a novel dynamic mixture of Hard-EM
and MAPO. Chain-of-Questions greatly outperforms strong neuro-symbolic methods
by 9.0 F1 on DROP contrast set, and outperforms GPT-3.5 by 24.3 F1 on HOTPOTQA
adversarial set, thus demonstrating the effectiveness and robustness of our
framework.
( 2
min )
The count of people suffering from various levels of hearing loss reached
1.57 billion in 2019. This huge number tends to suffer on many personal and
professional levels and strictly needs to be included with the rest of society
healthily. This paper presents a proof of concept of an automatic sign language
recognition system based on data obtained using a wearable device of 3 flex
sensors. The system is designed to interpret a selected set of American Sign
Language (ASL) dynamic words by collecting data in sequences of the performed
signs and using machine learning methods. The built models achieved
high-quality performances, such as Random Forest with 99% accuracy, Support
Vector Machine (SVM) with 99%, and two K-Nearest Neighbor (KNN) models with
98%. This indicates many possible paths toward the development of a full-scale
system.
( 2
min )
Diffusion models have demonstrated strong potential for robotic trajectory
planning. However, generating coherent and long-horizon trajectories from
high-level instructions remains challenging, especially for complex tasks
requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end
hierarchical planning framework integrating interpretable skill learning with
conditional diffusion planning to address this problem. At the higher level,
the skill abstraction module learns discrete, human-understandable skill
representations from visual observations and language instructions. These
learned skill embeddings are then used to condition the diffusion model to
generate customized latent trajectories aligned with the skills. It allows for
generating diverse state trajectories that adhere to the learnable skills. By
integrating skill learning with conditional trajectory generation,
SkillDiffuser produces coherent behavior following abstract instructions across
diverse tasks. Experiments on multi-task robotic manipulation benchmarks like
Meta-World and LOReL demonstrate state-of-the-art performance and
human-interpretable skill representations from SkillDiffuser.
( 2
min )
Legged locomotion is arguably the most suited and versatile mode to deal with
natural or unstructured terrains. Intensive research into dynamic walking and
running controllers has recently yielded great advances, both in the optimal
control and reinforcement learning (RL) literature. Hopping is a challenging
dynamic task involving a flight phase and has the potential to increase the
traversability of legged robots. Model based control for hopping typically
relies on accurate detection of different jump phases, such as lift-off or
touch down, and using different controllers for each phase. In this paper, we
present a end-to-end RL based torque controller that learns to implicitly
detect the relevant jump phases, removing the need to provide manual heuristics
for state detection. We also extend a method for simulation to reality transfer
of the learned controller to contact rich dynamic tasks, resulting in
successful deployment on the robot after training without parameter tuning.
( 3
min )
Recently, large language models (LLMs) have made remarkable progress in
natural language processing. The most representative ability of LLMs is
in-context learning (ICL), which enables LLMs to learn patterns from in-context
exemplars without training. The performance of ICL greatly depends on the
exemplars used. However, how to choose exemplars remains unclear due to the
lack of understanding of how in-context learning works. In this paper, we
present a novel perspective on ICL by conceptualizing it as contextual
retrieval from a model of associative memory. We establish a theoretical
framework of ICL based on Hopfield Networks. Based on our framework, we look
into how in-context exemplars influence the performance of ICL and propose more
efficient active exemplar selection. Our study sheds new light on the mechanism
of ICL by connecting it to memory retrieval, with potential implications for
advancing the understanding of LLMs.
( 2
min )
Lipschitz-constrained neural networks have several advantages over
unconstrained ones and can be applied to a variety of problems, making them a
topic of attention in the deep learning community. Unfortunately, it has been
shown both theoretically and empirically that they perform poorly when equipped
with ReLU activation functions. By contrast, neural networks with learnable
1-Lipschitz linear splines are known to be more expressive. In this paper, we
show that such networks correspond to global optima of a constrained functional
optimization problem that consists of the training of a neural network composed
of 1-Lipschitz linear layers and 1-Lipschitz freeform activation functions with
second-order total-variation regularization. Further, we propose an efficient
method to train these neural networks. Our numerical experiments show that our
trained networks compare favorably with existing 1-Lipschitz neural
architectures.
( 2
min )
In this paper, we explore transferability in learning between different
attack classes in a network intrusion detection setup. We evaluate
transferability of attack classes by training a deep learning model with a
specific attack class and testing it on a separate attack class. We observe the
effects of real and synthetically generated data augmentation techniques on
transferability. We investigate the nature of observed transferability
relationships, which can be either symmetric or asymmetric. We also examine
explainability of the transferability relationships using the recursive feature
elimination algorithm. We study data preprocessing techniques to boost model
performance. The code for this work can be found at
https://github.com/ghosh64/transferability.
( 2
min )
In this work we develop a novel approach using deep neural networks to
reconstruct the conductivity distribution in elliptic problems from one
measurement of the solution over the whole domain. The approach is based on a
mixed reformulation of the governing equation and utilizes the standard
least-squares objective, with deep neural networks as ansatz functions to
approximate the conductivity and flux simultaneously. We provide a thorough
analysis of the deep neural network approximations of the conductivity for both
continuous and empirical losses, including rigorous error estimates that are
explicit in terms of the noise level, various penalty parameters and neural
network architectural parameters (depth, width and parameter bound). We also
provide multiple numerical experiments in two- and multi-dimensions to
illustrate distinct features of the approach, e.g., excellent stability with
respect to data noise and capability of solving high-dimensional problems.
( 2
min )
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse
training library for machine learning research. JaxPruner aims to accelerate
research on sparse neural networks by providing concise implementations of
popular pruning and sparse training algorithms with minimal memory and latency
overhead. Algorithms implemented in JaxPruner use a common API and work
seamlessly with the popular optimization library Optax, which, in turn, enables
easy integration with existing JAX based libraries. We demonstrate this ease of
integration by providing examples in four different codebases: Scenic, t5x,
Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
( 2
min )
In this study, we propose a new activation function, called Adaptive Smooth
Activation Unit (ASAU), tailored for optimized gradient propagation, thereby
enhancing the proficiency of convolutional networks in medical image analysis.
We apply this new activation function to two important and commonly used
general tasks in medical image analysis: automatic disease diagnosis and organ
segmentation in CT and MRI. Our rigorous evaluation on the RadImageNet
abdominal/pelvis (CT and MRI) dataset and Liver Tumor Segmentation Benchmark
(LiTS) 2017 demonstrates that our ASAU-integrated frameworks not only achieve a
substantial (4.80\%) improvement over ReLU in classification accuracy (disease
detection) on abdominal CT and MRI but also achieves 1\%-3\% improvement in
dice coefficient compared to widely used activations for `healthy liver tissue'
segmentation. These improvements offer new baselines for developing a
diagnostic tool, particularly for complex, challenging pathologies. The
superior performance and adaptability of ASAU highlight its potential for
integration into a wide range of image classification and segmentation tasks.
( 2
min )
In recent years, significant progress in generative AI has highlighted the
important role of physics-inspired models that utilize advanced mathematical
concepts based on fundamental physics principles to enhance artificial
intelligence capabilities. Among these models, those based on diffusion
equations have greatly improved image quality. This study aims to explore the
potential uses of Maxwell-Boltzmann equation, which forms the basis of the
kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix
Modelling (MMM) applications. We propose incorporating these equations into
Hierarchical Bayesian models to analyse consumer behaviour in the context of
advertising. These equation sets excel in accurately describing the random
dynamics in complex systems like social interactions and consumer-advertising
interactions.
( 2
min )
Recent work by Marino et al. (2020) showed improved performance in sequential
density estimation by combining masked autoregressive flows with hierarchical
latent variable models. We draw a connection between such autoregressive
generative models and the task of lossy video compression. Specifically, we
view recent neural video compression methods (Lu et al., 2019; Yang et al.,
2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal
autoregressive transform, and propose avenues for enhancement based on this
insight. Comprehensive evaluations on large-scale video data show improved
rate-distortion performance over both state-of-the-art neural and conventional
video compression methods.
( 2
min )
Diffusion-based generative models represent the current state-of-the-art for
image generation. However, standard diffusion models are based on Euclidean
geometry and do not translate directly to manifold-valued data. In this work,
we develop extensions of both score-based generative models (SGMs) and
Denoising Diffusion Probabilistic Models (DDPMs) to the Lie group of 3D
rotations, SO(3). SO(3) is of particular interest in many disciplines such as
robotics, biochemistry and astronomy/cosmology science. Contrary to more
general Riemannian manifolds, SO(3) admits a tractable solution to heat
diffusion, and allows us to implement efficient training of diffusion models.
We apply both SO(3) DDPMs and SGMs to synthetic densities on SO(3) and
demonstrate state-of-the-art results. Additionally, we demonstrate the
practicality of our model on pose estimation tasks and in predicting correlated
galaxy orientations for astrophysics/cosmology.
( 2
min )
As large language models (LLMs) like ChatGPT have gained traction, an
increasing number of news websites have begun utilizing them to generate
articles. However, not only can these language models produce factually
inaccurate articles on reputable websites but disreputable news sites can
utilize LLMs to mass produce misinformation. To begin to understand this
phenomenon, we present one of the first large-scale studies of the prevalence
of synthetic articles within online news media. To do this, we train a
DeBERTa-based synthetic news detector and classify over 15.90 million articles
from 3,074 misinformation and mainstream news websites. We find that between
January 1, 2022, and May 1, 2023, the relative number of synthetic news
articles increased by 55.4% on mainstream websites while increasing by 457% on
misinformation sites. We find that this increase is largely driven by smaller
less popular websites. Analyzing the impact of the release of ChatGPT using an
interrupted-time-series, we show that while its release resulted in a marked
increase in synthetic articles on small sites as well as misinformation news
websites, there was not a corresponding increase on large mainstream news
websites.
( 3
min )
Powered by new advances in sensor development and artificial intelligence,
the decreasing cost of computation, and the pervasiveness of handheld
computation devices, biometric user authentication (and identification) is
rapidly becoming ubiquitous. Modern approaches to biometric authentication,
based on sophisticated machine learning techniques, cannot avoid storing either
trained-classifier details or explicit user biometric data, thus exposing
users' credentials to falsification. In this paper, we introduce a secure way
to handle user-specific information involved with the use of vector-space
classifiers or artificial neural networks for biometric authentication. Our
proposed architecture, called a Neural Fuzzy Extractor (NFE), allows the
coupling of pre-existing classifiers with fuzzy extractors, through a
artificial-neural-network-based buffer called an expander, with minimal or no
performance degradation. The NFE thus offers all the performance advantages of
modern deep-learning-based classifiers, and all the security of standard fuzzy
extractors. We demonstrate the NFE retrofit to a classic artificial neural
network for a simple scenario of fingerprint-based user authentication.
( 3
min )
This paper presents the computational challenge on topological deep learning
that was hosted within the ICML 2023 Workshop on Topology and Geometry in
Machine Learning. The competition asked participants to provide open-source
implementations of topological neural networks from the literature by
contributing to the python packages TopoNetX (data processing) and TopoModelX
(deep learning). The challenge attracted twenty-eight qualifying submissions in
its two-month duration. This paper describes the design of the challenge and
summarizes its main findings.
( 2
min )
Objective: Early identification of ADHD is necessary to provide the
opportunity for timely treatment. However, screening the symptoms of ADHD on a
large scale is not easy. This study aimed to validate a video game (FishFinder)
for the screening of ADHD using objective measurement of the core symptoms of
this disorder. Method: The FishFinder measures attention and impulsivity
through in-game performance and evaluates the child's hyperactivity using
smartphone motion sensors. This game was tested on 26 children with ADHD and 26
healthy children aged 5 to 12 years. A Support Vector Machine was employed to
detect children with ADHD. results: This system showed 92.3% accuracy, 90%
sensitivity, and 93.7% specificity using a combination of in-game and movement
features. Conclusions: The FishFinder demonstrated a strong ability to identify
ADHD in children. So, this game can be used as an affordable, accessible, and
enjoyable method for the objective screening of ADHD.
( 2
min )
Accurately predicting line loss rates is vital for effective line loss
management in distribution networks, especially over short-term multi-horizons
ranging from one hour to one week. In this study, we propose
Attention-GCN-LSTM, a novel method that combines Graph Convolutional Networks
(GCN), Long Short-Term Memory (LSTM), and a three-level attention mechanism to
address this challenge. By capturing spatial and temporal dependencies, our
model enables accurate forecasting of line loss rates across multiple horizons.
Through comprehensive evaluation using real-world data from 10KV feeders, our
Attention-GCN-LSTM model consistently outperforms existing algorithms,
exhibiting superior performance in terms of prediction accuracy and
multi-horizon forecasting. This model holds significant promise for enhancing
line loss management in distribution networks.
( 2
min )
Text classification is an important topic in the field of natural language
processing. It has been preliminarily applied in information retrieval, digital
library, automatic abstracting, text filtering, word semantic discrimination
and many other fields. The aim of this research is to use a variety of
algorithms to test the ability to identify offensive posts and evaluate their
performance against a variety of assessment methods. The motivation for this
project is to reduce the harm of these languages to human censors by automating
the screening of offending posts. The field is a new one, and despite much
interest in the past two years, there has been no focus on the object of the
offence. Through the experiment of this project, it should inspire future
research on identification methods as well as identification content.
( 2
min )
Causal discovery with latent variables is a crucial but challenging task.
Despite the emergence of numerous methods aimed at addressing this challenge,
they are not fully identified to the structure that two observed variables are
influenced by one latent variable and there might be a directed edge in
between. Interestingly, we notice that this structure can be identified through
the utilization of higher-order cumulants. By leveraging the higher-order
cumulants of non-Gaussian data, we provide an analytical solution for
estimating the causal coefficients or their ratios. With the estimated (ratios
of) causal coefficients, we propose a novel approach to identify the existence
of a causal edge between two observed variables subject to latent variable
influence. In case when such a causal edge exits, we introduce an asymmetry
criterion to determine the causal direction. The experimental results
demonstrate the effectiveness of our proposed method.
( 2
min )
De novo peptide sequencing from mass spectrometry (MS) data is a critical
task in proteomics research. Traditional de novo algorithms have encountered a
bottleneck in accuracy due to the inherent complexity of proteomics data. While
deep learning-based methods have shown progress, they reduce the problem to a
translation task, potentially overlooking critical nuances between spectra and
peptides. In our research, we present ContraNovo, a pioneering algorithm that
leverages contrastive learning to extract the relationship between spectra and
peptides and incorporates the mass information into peptide decoding, aiming to
address these intricacies more efficiently. Through rigorous evaluations on two
benchmark datasets, ContraNovo consistently outshines contemporary
state-of-the-art solutions, underscoring its promising potential in enhancing
de novo peptide sequencing. The source code is available at
https://github.com/BEAM-Labs/ContraNovo.
( 2
min )
In this paper, we focus on the prediction phase of a random forest and study
the problem of representing a bag of decision trees using a smaller bag of
decision trees, where we only consider binary decision problems on the binary
domain and simple decision trees in which an internal node is limited to
querying the Boolean value of a single variable. As a main result, we show that
the majority function of $n$ variables can be represented by a bag of $T$ ($<
n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$
and $T$ must be odd (in order to avoid the tie break). We also show that a bag
of $n$ decision trees can be represented by a bag of $T$ decision trees each
with polynomial size if $n-T$ is a constant and a small classification error is
allowed. A related result on the $k$-out-of-$n$ functions is presented too.
( 2
min )
Drawing on theoretical insights, we advocate an error-based thresholding
(EBT) mechanism for learned ISTA (LISTA), which utilizes a function of the
layer-wise reconstruction error to suggest a specific threshold for each
observation in the shrinkage function of each layer. We show that the proposed
EBT mechanism well disentangles the learnable parameters in the shrinkage
functions from the reconstruction errors, endowing the obtained models with
improved adaptivity to possible data variations. With rigorous analyses, we
further show that the proposed EBT also leads to a faster convergence on the
basis of LISTA or its variants, in addition to its higher adaptivity. Extensive
experimental results confirm our theoretical analyses and verify the
effectiveness of our methods.
( 2
min )
Causal Structure Learning (CSL), amounting to extracting causal relations
among the variables in a dataset, is widely perceived as an important step
towards robust and transparent models. Constraint-based CSL leverages
conditional independence tests to perform causal discovery. We propose
Shapley-PC, a novel method to improve constraint-based CSL algorithms by using
Shapley values over the possible conditioning sets to decide which variables
are responsible for the observed conditional (in)dependences. We prove
soundness and asymptotic consistency and demonstrate that it can outperform
state-of-the-art constraint-based, search-based and functional causal
model-based methods, according to standard metrics in CSL.
( 2
min )
A large body of NLP research has documented the ways gender biases manifest
and amplify within large language models (LLMs), though this research has
predominantly operated within a gender binary-centric context. A growing body
of work has identified the harmful limitations of this gender-exclusive
framing; many LLMs cannot correctly and consistently refer to persons outside
the gender binary, especially if they use neopronouns. While data scarcity has
been identified as a possible culprit, the precise mechanisms through which it
influences LLM misgendering remain underexplored. Our work addresses this gap
by studying data scarcity's role in subword tokenization and, consequently, the
formation of LLM word representations. We uncover how the Byte-Pair Encoding
(BPE) tokenizer, a backbone for many popular LLMs, contributes to neopronoun
misgendering through out-of-vocabulary behavior. We introduce pronoun
tokenization parity (PTP), a novel approach to reduce LLM neopronoun
misgendering by preserving a token's functional structure. We evaluate PTP's
efficacy using pronoun consistency-based metrics and a novel syntax-based
metric. Through several controlled experiments, finetuning LLMs with PTP
improves neopronoun consistency from 14.5% to 58.4%, highlighting the
significant role tokenization plays in LLM pronoun consistency.
( 3
min )
Chronic Obstructive Pulmonary Disorder (COPD) is a prevalent respiratory
disease that significantly impacts the quality of life of affected individuals.
This paper presents COPDFlowNet, a novel deep-learning framework that leverages
a custom Generative Adversarial Network (GAN) to generate synthetic
Computational Fluid Dynamics (CFD) velocity flow field images specific to the
trachea of COPD patients. These synthetic images serve as a valuable resource
for data augmentation and model training. Additionally, COPDFlowNet
incorporates a custom Convolutional Neural Network (CNN) architecture to
predict the location of the obstruction site.
( 2
min )
In recent years, simulations of pedestrians using the multi-agent
reinforcement learning (MARL) have been studied. This study considered the
roads on a grid-world environment, and implemented pedestrians as MARL agents
using an echo-state network and the least squares policy iteration method.
Under this environment, the ability of these agents to learn to move forward by
avoiding other agents was investigated. Specifically, we considered two types
of tasks: the choice between a narrow direct route and a broad detour, and the
bidirectional pedestrian flow in a corridor. The simulations results indicated
that the learning was successful when the density of the agents was not that
high.
( 2
min )
Multi-document summarization is the process of automatically generating a
concise summary of multiple documents related to the same topic. This summary
can help users quickly understand the key information from a large collection
of documents. Multi-document summarization systems are more complex than
single-document summarization systems due to the need to identify and combine
information from multiple sources. In this paper, we have developed a machine
learning model that generates a concise summary of a topic from multiple news
documents. The model is designed to be unbiased by sampling its input equally
from all the different aspects of the topic, even if the majority of the news
sources lean one way.
( 2
min )
Graph Neural Networks are notorious for its memory consumption. A recent
Transformer based GNN called Graph Transformer are shown to obtain superior
performances when long range dependencies exist. However, combining graph data
and Transformer architecture led to a combinationally worse memory issue. We
propose a novel version of "edge regularization technique" that alleviates the
need for Positional Encoding and ultimately alleviate GT's out of memory issue.
We observe that it is not clear whether having an edge regularization on top of
positional encoding is helpful. However, it seems evident when no positional
encoding is applied, edge regularization technique indeed stably improves GT's
performance.
( 2
min )
In this work we introduce Labrador, a pre-trained Transformer model for
laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million
lab test results from electronic health records (EHRs) and evaluated on various
downstream outcome prediction tasks. Both models demonstrate mastery of the
pre-training task but neither consistently outperform XGBoost on downstream
supervised tasks. Our ablation studies reveal that transfer learning shows
limited effectiveness for BERT and achieves marginal success with Labrador. We
explore the reasons for the failure of transfer learning and suggest that the
data generating process underlying each patient cannot be characterized
sufficiently using labs alone, among other factors. We encourage future work to
focus on joint modeling of multiple EHR data categories and to include
tree-based baselines in their evaluations.
( 2
min )
Graph-based collaborative filtering methods have prevailing performance for
recommender systems since they can capture high-order information between users
and items, in which the graphs are constructed from the observed user-item
interactions that might miss links or contain spurious positive interactions in
industrial scenarios. The Bayesian Graph Neural Network framework approaches
this issue with generative models for the interaction graphs. The critical
problem is to devise a proper family of graph generative models tailored to
recommender systems. We propose an efficient generative model that jointly
considers the preferences of users, the concurrence of items and some important
graph structure information. Experiments on four popular benchmark datasets
demonstrate the effectiveness of our proposed graph generative methods for
recommender systems.
( 2
min )
Motivated by applications in queueing theory, we consider a stochastic
control problem whose state space is the $d$-dimensional positive orthant. The
controlled process $Z$ evolves as a reflected Brownian motion whose covariance
matrix is exogenously specified, as are its directions of reflection from the
orthant's boundary surfaces. A system manager chooses a drift vector
$\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at
time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem
formulation, the objective is to minimize expected discounted cost over an
infinite planning horizon, after which we treat the corresponding ergodic
control problem. Extending earlier work by Han et al. (Proceedings of the
National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a
simulation-based computational method that relies heavily on deep neural
network technology. For test problems studied thus far, our method is accurate
to within a fraction of one percent, and is computationally feasible in
dimensions up to at least $d=30$.
( 2
min )
We provide a systematic investigation of using physics-informed neural
networks to compute Lyapunov functions. We encode Lyapunov conditions as a
partial differential equation (PDE) and use this for training neural network
Lyapunov functions. We analyze the analytical properties of the solutions to
the Lyapunov and Zubov PDEs. In particular, we show that employing the Zubov
equation in training neural Lyapunov functions can lead to approximate regions
of attraction close to the true domain of attraction. We also examine
approximation errors and the convergence of neural approximations to the unique
solution of Zubov's equation. We then provide sufficient conditions for the
learned neural Lyapunov functions that can be readily verified by
satisfiability modulo theories (SMT) solvers, enabling formal verification of
both local stability analysis and region-of-attraction estimates in the large.
Through a number of nonlinear examples, ranging from low to high dimensions, we
demonstrate that the proposed framework can outperform traditional
sums-of-squares (SOS) Lyapunov functions obtained using semidefinite
programming (SDP).
( 2
min )
We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} +
b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in
\mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$.
On $n$ parallel processors, the computation of $n$ elements incurs
$\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ space. Sequences of this form
are ubiquitous in science and engineering, making efficient parallelization
useful for a vast number of applications. We implement our expression in
software, test it on parallel hardware, and verify that it executes faster than
sequential computation by a factor of $\frac{n}{\log n}$.
( 2
min )
Air pollution is a result of multiple sources including both natural and
anthropogenic activities. The rapid urbanization of the cities such as
Bujumbura economic capital of Burundi, is one of these factors. The very first
characterization of the spatio-temporal variability of PM2.5 in Bujumbura and
the forecasting of PM2.5 concentration have been conducted in this paper using
data collected during a year, from august 2022 to august 2023, by low cost
sensors installed in Bujumbura city. For each commune, an hourly, daily and
seasonal analysis were carried out and the results showed that the mass
concentrations of PM2.5 in the three municipalities differ from one commune to
another. The average hourly and annual PM2.5 concentrations exceed the World
Health Organization standards. The range is between 28.3 and 35.0 microgram/m3
. In order to make prediction of PM2.5 concentration, an investigation of RNN
with Long Short Term Memory (LSTM) has been undertaken.
( 2
min )
Over the last decade, the Dip-test of unimodality has gained increasing
interest in the data mining community as it is a parameter-free statistical
test that reliably rates the modality in one-dimensional samples. It returns a
so called Dip-value and a corresponding probability for the sample's
unimodality (Dip-p-value). These two values share a sigmoidal relationship.
However, the specific transformation is dependent on the sample size. Many
Dip-based clustering algorithms use bootstrapped look-up tables translating
Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a
specifically designed sigmoid function as a substitute for these
state-of-the-art look-up tables. This accelerates computation and provides an
approximation of the Dip- to Dip-p-value transformation for every single sample
size. Further, it is differentiable and can therefore easily be integrated in
learning schemes using gradient descent. We showcase this by exploiting our
function in a novel subspace clustering algorithm called Dip'n'Sub. We
highlight in extensive experiments the various benefits of our proposal.
( 3
min )
Time-series anomaly detection deals with the problem of detecting anomalous
timesteps by learning normality from the sequence of observations. However, the
concept of normality evolves over time, leading to a "new normal problem",
where the distribution of normality can be changed due to the distribution
shifts between training and test data. This paper highlights the prevalence of
the new normal problem in unsupervised time-series anomaly detection studies.
To tackle this issue, we propose a simple yet effective test-time adaptation
strategy based on trend estimation and a self-supervised approach to learning
new normalities during inference. Extensive experiments on real-world
benchmarks demonstrate that incorporating the proposed strategy into the
anomaly detector consistently improves the model's performance compared to the
baselines, leading to robustness to the distribution shifts.
( 2
min )
We provide an optimized implementation of the forward pass of
FlashAttention-2, a popular memory-aware scaled dot-product attention
algorithm, as a custom fused CUDA kernel targeting NVIDIA Hopper architecture
and written using the open-source CUTLASS library. In doing so, we explain the
challenges and techniques involved in fusing online-softmax with back-to-back
GEMM kernels, utilizing the Hopper-specific Tensor Memory Accelerator (TMA) and
Warpgroup Matrix-Multiply-Accumulate (WGMMA) instructions, defining and
transforming CUTLASS Layouts and Tensors, overlapping copy and GEMM operations,
and choosing optimal tile sizes for the Q, K and V attention matrices while
balancing the register pressure and shared memory utilization. In head-to-head
benchmarks on a single H100 PCIe GPU for some common choices of
hyperparameters, we observe 20-50% higher FLOPs/s over a version of
FlashAttention-2 optimized for last-generation NVIDIA Ampere architecture.
( 2
min )
This paper introduces a novel approach for topic modeling utilizing latent
codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely
encapsulating the rich information of the pre-trained embeddings such as the
pre-trained language model. From the novel interpretation of the latent
codebooks and embeddings as conceptual bag-of-words, we propose a new
generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates
the original documents related to the respective latent codebook. The TVQ-VAE
can visualize the topics with various generative distributions including the
traditional BoW distribution and the autoregressive image generation. Our
experimental results on document analysis and image generation demonstrate that
TVQ-VAE effectively captures the topic context which reveals the underlying
structures of the dataset and supports flexible forms of document generation.
Official implementation of the proposed TVQ-VAE is available at
https://github.com/clovaai/TVQ-VAE.
( 2
min )
The unprecedented performance of machine learning models in recent years,
particularly Deep Learning and transformer models, has resulted in their
application in various domains such as finance, healthcare, and education.
However, the models are error-prone and cannot be used autonomously, especially
in decision-making scenarios where, technically or ethically, the cost of error
is high. Moreover, because of the black-box nature of these models, it is
frequently difficult for the end user to comprehend the models' outcomes and
underlying processes to trust and use the model outcome to make a decision.
Explainable Artificial Intelligence (XAI) aids end-user understanding of the
model by utilizing approaches, including visualization techniques, to explain
and interpret the inner workings of the model and how it arrives at a result.
Although numerous research studies have been conducted recently focusing on the
performance of models and the XAI approaches, less work has been done on the
impact of explanations on human-AI team performance. This paper surveyed the
recent empirical studies on XAI's impact on human-AI decision-making,
identified the challenges, and proposed future research directions.
( 2
min )
In this article we offer a comprehensive analysis of the Urysohn's classifier
in a binary classification context. It utilizes Urysohn's Lemma of Topology to
construct separating functions, providing rigorous and adaptable solutions.
Numerical experiments demonstrated exceptional performance, with scores ranging
from 95% to 100%. Notably, the Urysohn's classifier outperformed CatBoost and
KNN in various scenarios. Despite sensitivity to the p-metric parameter, it
proved robust and adaptable. The Urysohn's classifier's mathematical rigor and
adaptability make it promising for binary classification, with applications in
medical diagnosis, fraud detection and cyber security. Future research includes
parameter optimization and combining the Urysohn's classifier with other
techniques. It offers an elegant and principled approach to classification,
ensuring integrity and valuable data insights.
( 2
min )
Over the last decade, the Dip-test of unimodality has gained increasing
interest in the data mining community as it is a parameter-free statistical
test that reliably rates the modality in one-dimensional samples. It returns a
so called Dip-value and a corresponding probability for the sample's
unimodality (Dip-p-value). These two values share a sigmoidal relationship.
However, the specific transformation is dependent on the sample size. Many
Dip-based clustering algorithms use bootstrapped look-up tables translating
Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a
specifically designed sigmoid function as a substitute for these
state-of-the-art look-up tables. This accelerates computation and provides an
approximation of the Dip- to Dip-p-value transformation for every single sample
size. Further, it is differentiable and can therefore easily be integrated in
learning schemes using gradient descent. We showcase this by exploiting our
function in a novel subspace clustering algorithm called Dip'n'Sub. We
highlight in extensive experiments the various benefits of our proposal.
( 3
min )
Air pollution is a result of multiple sources including both natural and
anthropogenic activities. The rapid urbanization of the cities such as
Bujumbura economic capital of Burundi, is one of these factors. The very first
characterization of the spatio-temporal variability of PM2.5 in Bujumbura and
the forecasting of PM2.5 concentration have been conducted in this paper using
data collected during a year, from august 2022 to august 2023, by low cost
sensors installed in Bujumbura city. For each commune, an hourly, daily and
seasonal analysis were carried out and the results showed that the mass
concentrations of PM2.5 in the three municipalities differ from one commune to
another. The average hourly and annual PM2.5 concentrations exceed the World
Health Organization standards. The range is between 28.3 and 35.0 microgram/m3
. In order to make prediction of PM2.5 concentration, an investigation of RNN
with Long Short Term Memory (LSTM) has been undertaken.
( 2
min )
Recent advances in autonomous robotic technologies have highlighted the
growing need for precise environmental analysis. LiDAR semantic segmentation
has gained attention to accomplish fine-grained scene understanding by acting
directly on raw content provided by sensors. Recent solutions showed how
different learning techniques can be used to improve the performance of the
model, without any architectural or dataset change. Following this trend, we
present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK)
derived from a standard model. First, classes are clustered into macro groups
according to mutual prediction errors; then, the learning process is
regularized by: (1) aligning class-conditional prototypical feature
representation for both fine and coarse classes, (2) weighting instances with a
per-class fairness index. Our LEAK approach is very general and can be
seamlessly applied on top of any segmentation architecture; indeed,
experimental results showed that it enables state-of-the-art performances on
different architectures, datasets and tasks, while ensuring more balanced
class-wise results and faster convergence.
( 2
min )
Offline reinforcement learning (RL) aims to learn an effective policy from a
pre-collected dataset. Most existing works are to develop sophisticated
learning algorithms, with less emphasis on improving the data collection
process. Moreover, it is even challenging to extend the single-task setting and
collect a task-agnostic dataset that allows an agent to perform multiple
downstream tasks. In this paper, we propose a Curiosity-driven Unsupervised
Data Collection (CUDC) method to expand feature space using adaptive temporal
distances for task-agnostic data collection and ultimately improve learning
efficiency and capabilities for multi-task offline RL. To achieve this, CUDC
estimates the probability of the k-step future states being reachable from the
current states, and adapts how many steps into the future that the dynamics
model should predict. With this adaptive reachability mechanism in place, the
feature representation can be diversified, and the agent can navigate itself to
collect higher-quality data with curiosity. Empirically, CUDC surpasses
existing unsupervised methods in efficiency and learning performance in various
downstream offline RL tasks of the DeepMind control suite.
( 2
min )
The Distributional Random Forest (DRF) is a recently introduced Random Forest
algorithm to estimate multivariate conditional distributions. Due to its
general estimation procedure, it can be employed to estimate a wide range of
targets such as conditional average treatment effects, conditional quantiles,
and conditional correlations. However, only results about the consistency and
convergence rate of the DRF prediction are available so far. We characterize
the asymptotic distribution of DRF and develop a bootstrap approximation of it.
This allows us to derive inferential tools for quantifying standard errors and
the construction of confidence regions that have asymptotic coverage
guarantees. In simulation studies, we empirically validate the developed theory
for inference of low-dimensional targets and for testing distributional
differences between two populations.
( 2
min )
We establish novel rates for the Gaussian approximation of random deep neural
networks with Gaussian parameters (weights and biases) and Lipschitz activation
functions, in the wide limit. Our bounds apply for the joint output of a
network evaluated any finite input set, provided a certain non-degeneracy
condition of the infinite-width covariances holds. We demonstrate that the
distance between the network output and the corresponding Gaussian
approximation scales inversely with the width of the network, exhibiting faster
convergence than the naive heuristic suggested by the central limit theorem. We
also apply our bounds to obtain theoretical approximations for the exact
Bayesian posterior distribution of the network, when the likelihood is a
bounded Lipschitz function of the network output evaluated on a (finite)
training set. This includes popular cases such as the Gaussian likelihood, i.e.
exponential of minus the mean squared error.
( 2
min )
Today we are excited to announce that the Llama Guard model is now available for customers using Amazon SageMaker JumpStart. Llama Guard provides input and output safeguards in large language model (LLM) deployment. It’s one of the components under Purple Llama, Meta’s initiative featuring open trust and safety tools and evaluations to help developers build […]
( 15
min )
In this post, you learn how to prepare data sourced from Amazon Security Lake, and then train and deploy an ML model using an IP Insights algorithm in SageMaker. This model identifies anomalous network traffic or behavior which can then be composed as part of a larger end-to-end security solution.
( 13
min )
In this issue of Research Focus: Optimized exit-augmented models for scalable efficient inference; NeurIPS LLM Efficiency Challenge; LLM-empowered automated data exploration; Boosting cloud efficiency with data-driven decision-making and optimization.
The post Research Focus: Week of December 18, 2023 appeared first on Microsoft Research.
( 9
min )
Outside the glare of the klieg lights that ChatGPT commanded this year, a troupe of autonomous machines nudged the frontiers of robotics forward. Here are six that showed special prowess — swimming, diving, gripping, seeing, strolling and flying through 2023. A Media Darling at CES Ella — a smart stroller from startup Glüxkind Technologies, of Read article >
( 7
min )
Thomson Reuters, the global content and technology company, is transforming the legal industry with generative AI. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Thomson Reuters Chief Product Officer David Wong about its potential — and implications. Many of Thomson Reuters offerings for the legal industry either address an information Read article >
( 6
min )
The latest OpenUSD updates enable users to tackle larger, more complex scenes with enhanced geometry control and streamlined asset management.
( 7
min )
These compounds can kill methicillin-resistant Staphylococcus aureus (MRSA), a bacterium that causes deadly infections.
( 10
min )
This new method draws on 200-year-old geometric foundations to give artists control over the appearance of animated characters.
( 10
min )
The latest industrial inference engines, such as FasterTransformer and
TurboTransformers, have verified that half-precision floating point (FP16) and
8-bit integer (INT8) quantization can greatly improve model inference speed.
However, the existing INT8 quantization methods are too complicated, and
improper usage will lead to model performance damage greatly. In this paper, we
develop a toolkit for users to easily quantize their models for inference, in
which Self-Adaptive Mixed-Precision (SAMP) is proposed to automatically control
quantization rate by a mixed-precision architecture to balance model accuracy
and efficiency. Experimental results show that our SAMP toolkit has a higher
speedup than PyTorch and FasterTransformer while ensuring the required
accuracy. In addition, SAMP is based on a modular design, decoupling the
tokenizer, embedding, encoder and target layers, which allows users to handle
various downstream tasks and can be seamlessly integrated into PyTorch.
( 2
min )
Online social media is integral to human life, facilitating messaging,
information sharing, and confidential communication while preserving privacy.
Platforms like Twitter, Instagram, and Facebook exemplify this phenomenon.
However, users face challenges due to network anomalies, often stemming from
malicious activities such as identity theft for financial gain or harm. This
paper proposes a novel method using user similarity measures and the Generative
Adversarial Network (GAN) algorithm to identify fake user accounts in the
Twitter dataset. Despite the problem's complexity, the method achieves an AUC
rate of 80\% in classifying and detecting fake accounts. Notably, the study
builds on previous research, highlighting advancements and insights into the
evolving landscape of anomaly detection in online social networks.
( 2
min )
Much research has been devoted to the problem of learning fair
representations; however, they do not explicitly the relationship between
latent representations. In many real-world applications, there may be causal
relationships between latent representations. Furthermore, most fair
representation learning methods focus on group-level fairness and are based on
correlations, ignoring the causal relationships underlying the data. In this
work, we theoretically demonstrate that using the structured representations
enable downstream predictive models to achieve counterfactual fairness, and
then we propose the Counterfactual Fairness Variational AutoEncoder (CF-VAE) to
obtain structured representations with respect to domain knowledge. The
experimental results show that the proposed method achieves better fairness and
accuracy performance than the benchmark fairness methods.
( 2
min )
In this paper, we interpret disentanglement as the discovery of local charts
of the data manifold and trace how this definition naturally leads to an
equivalent condition for disentanglement: commutativity between factors of
variation. We study the impact of this manifold framework to two classes of
problems: learning matrix exponential operators and compressing data-generating
models. In each problem, the manifold perspective yields interesting results
about the feasibility and fruitful approaches their solutions. We also link our
manifold framework to two other common disentanglement paradigms: group
theoretic and probabilistic approaches to disentanglement. In each case, we
show how these frameworks can be merged with our manifold perspective.
Importantly, we recover commutativity as a central property in both alternative
frameworks, further highlighting its importance in disentanglement.
( 2
min )
We introduce Mesogeos, a large-scale multi-purpose dataset for wildfire
modeling in the Mediterranean. Mesogeos integrates variables representing
wildfire drivers (meteorology, vegetation, human activity) and historical
records of wildfire ignitions and burned areas for 17 years (2006-2022). It is
designed as a cloud-friendly spatio-temporal dataset, namely a datacube,
harmonizing all variables in a grid of 1km x 1km x 1-day resolution. The
datacube structure offers opportunities to assess machine learning (ML) usage
in various wildfire modeling tasks. We extract two ML-ready datasets that
establish distinct tracks to demonstrate this potential: (1) short-term
wildfire danger forecasting and (2) final burned area estimation given the
point of ignition. We define appropriate metrics and baselines to evaluate the
performance of models in each track. By publishing the datacube, along with the
code to create the ML datasets and models, we encourage the community to foster
the implementation of additional tracks for mitigating the increasing threat of
wildfires in the Mediterranean.
( 2
min )
We study hypothesis testing under communication constraints, where each
sample is quantized before being revealed to a statistician. Without
communication constraints, it is well known that the sample complexity of
simple binary hypothesis testing is characterized by the Hellinger distance
between the distributions. We show that the sample complexity of simple binary
hypothesis testing under communication constraints is at most a logarithmic
factor larger than in the unconstrained setting and this bound is tight. We
develop a polynomial-time algorithm that achieves the aforementioned sample
complexity. Our framework extends to robust hypothesis testing, where the
distributions are corrupted in the total variation distance. Our proofs rely on
a new reverse data processing inequality and a reverse Markov inequality, which
may be of independent interest. For simple $M$-ary hypothesis testing, the
sample complexity in the absence of communication constraints has a logarithmic
dependence on $M$. We show that communication constraints can cause an
exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive
algorithms.
( 2
min )
The applications of traditional statistical feature selection methods to
high-dimension, low sample-size data often struggle and encounter challenging
problems, such as overfitting, curse of dimensionality, computational
infeasibility, and strong model assumption. In this paper, we propose a novel
two-step nonparametric approach called Deep Feature Screening (DeepFS) that can
overcome these problems and identify significant features with high precision
for ultra high-dimensional, low-sample-size data. This approach first extracts
a low-dimensional representation of input data and then applies feature
screening based on multivariate rank distance correlation recently developed by
Deb and Sen (2021). This approach combines the strengths of both deep neural
networks and feature screening, and thereby has the following appealing
features in addition to its ability of handling ultra high-dimensional data
with small number of samples: (1) it is model free and distribution free; (2)
it can be used for both supervised and unsupervised feature selection; and (3)
it is capable of recovering the original input data. The superiority of DeepFS
is demonstrated via extensive simulation studies and real data analyses.
( 2
min )
In this paper, we provide a geometric interpretation of the structure of Deep
Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp
activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt)
cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension
$Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training
input size $N$ can be arbitrarily large - thus, we are considering the
underparametrized regime. We apply our recent results on shallow neural
networks to construct an explicit family of minimizers for the global minimum
of the cost function in the case $L\geq Q$, which we show to be degenerate. In
the context presented here, the hidden layers of the DL network "curate" the
training inputs by recursive application of a truncation map that minimizes the
noise to signal ratio of the training inputs. Moreover, we determine a set of
$2^Q-1$ distinct degenerate local minima of the cost function. Our
constructions make no use of gradient descent algorithms at all.
( 3
min )
As AI systems become more intelligent and their behavior becomes more
challenging to assess, they may learn to game the flaws of human feedback
instead of genuinely striving to follow instructions; however, this risk can be
mitigated by controlling how LLMs generalize human feedback to situations where
it is unreliable. To better understand how reward models generalize, we craft
69 distribution shifts spanning 8 categories. We find that reward models do not
learn to evaluate `instruction-following' by default and instead favor personas
that resemble internet text. Techniques for interpreting reward models'
internal representations achieve better generalization than standard
fine-tuning, but still frequently fail to distinguish instruction-following
from conflated behaviors. We consolidate the 15 most challenging distribution
shifts into the GENeralization analogIES (GENIES) benchmark, which we hope will
enable progress toward controlling reward model generalization.
( 2
min )
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used
extensively to train artificial neural networks. However very little is known
on to what extent SGD is crucial for to the success of this technology and, in
particular, how much it is effective in optimizing high-dimensional non-convex
cost functions as compared to other optimization algorithms such as Gradient
Descent (GD). In this work we leverage dynamical mean field theory to benchmark
its performances in the high-dimensional limit. To do that, we consider the
problem of recovering a hidden high-dimensional non-linearly encrypted signal,
a prototype high-dimensional non-convex hard optimization problem. We compare
the performances of SGD to GD and we show that SGD largely outperforms GD for
sufficiently small batch sizes. In particular, a power law fit of the
relaxation time of these algorithms shows that the recovery threshold for SGD
with small batch size is smaller than the corresponding one of GD.
( 2
min )
With the rapid growth of edge intelligence, the deployment of federated
learning (FL) over wireless networks has garnered increasing attention, which
is called Federated Edge Learning (FEEL). In FEEL, both mobile devices
transmitting model parameters over noisy channels and collecting data in
diverse environments pose challenges to the generalization of trained models.
Moreover, devices can engage in decentralized FL via Device-to-Device
communication while the communication topology of connected devices also
impacts the generalization of models. Most recent theoretical studies overlook
the incorporation of all these effects into FEEL when developing generalization
analyses. In contrast, our work presents an information-theoretic
generalization analysis for topology-aware FEEL in the presence of data
heterogeneity and noisy channels. Additionally, we propose a novel
regularization method called Federated Global Mutual Information Reduction
(FedGMIR) to enhance the performance of models based on our analysis. Numerical
results validate our theoretical findings and provide evidence for the
effectiveness of the proposed method.
( 2
min )
Predictive algorithms are often trained by optimizing some loss function, to
which regularization functions are added to impose a penalty for violating
constraints. As expected, the addition of such regularization functions can
change the minimizer of the objective. It is not well-understood which
regularizers change the minimizer of the loss, and, when the minimizer does
change, how it changes. We use property elicitation to take first steps towards
understanding the joint relationship between the loss and regularization
functions and the optimal decision for a given problem instance. In particular,
we give a necessary and sufficient condition on loss and regularizer pairs for
when a property changes with the addition of the regularizer, and examine some
regularizers satisfying this condition standard in the fair machine learning
literature. We empirically demonstrate how algorithmic decision-making changes
as a function of both data distribution changes and hardness of the
constraints.
( 2
min )
In the domain of music and sound processing, pitch extraction plays a pivotal
role. Our research presents a specialized convolutional neural network designed
for pitch extraction, particularly from the human singing voice in acapella
performances. Notably, our approach combines synthetic data with auto-labeled
acapella sung audio, creating a robust training environment. Evaluation across
datasets comprising synthetic sounds, opera recordings, and time-stretched
vowels demonstrates its efficacy. This work paves the way for enhanced pitch
extraction in both music and voice settings.
( 2
min )
Second-order methods for deep learning -- such as KFAC -- can be useful for
neural net training. However, they are often memory-inefficient and numerically
unstable for low-precision training since their preconditioning Kronecker
factors are dense, and require high-precision matrix inversion or
decomposition. Consequently, such methods are not widely used for training
large neural networks such as transformer-based models. We address these two
issues by (i) formulating an inverse-free update of KFAC and (ii) imposing
structures in each of the Kronecker factors, resulting in a method we term
structured inverse-free natural gradient descent (SINGD). On large modern
neural networks, we show that, in contrast to KFAC, SINGD is memory efficient
and numerically robust, and often outperforms AdamW even in half precision.
Hence, our work closes a gap between first-order and second-order methods in
modern low precision training for large neural nets.
( 2
min )
This paper considers learning the hidden causal network of a linear networked
dynamical system (NDS) from the time series data at some of its nodes --
partial observability. The dynamics of the NDS are driven by colored noise that
generates spurious associations across pairs of nodes, rendering the problem
much harder. To address the challenge of noise correlation and partial
observability, we assign to each pair of nodes a feature vector computed from
the time series data of observed nodes. The feature embedding is engineered to
yield structural consistency: there exists an affine hyperplane that
consistently partitions the set of features, separating the feature vectors
corresponding to connected pairs of nodes from those corresponding to
disconnected pairs. The causal inference problem is thus addressed via
clustering the designed features. We demonstrate with simple baseline
supervised methods the competitive performance of the proposed causal inference
mechanism under broad connectivity regimes and noise correlation levels,
including a real world network. Further, we devise novel technical guarantees
of structural consistency for linear NDS under the considered regime.
( 3
min )
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language
understanding over long texts, which contains only test and small validation
sets, without training data. We adapt six tasks from the SCROLLS benchmark, and
add four new datasets, including two novel information fusing tasks, such as
aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a
comprehensive evaluation of both open-source and closed large language models,
finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest
average score. However, there is still room for improvement on multiple open
challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to
pass the naive baseline. As the state of the art is a moving target, we invite
researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard.
( 2
min )
Artificial intelligence (AI) and machine learning (ML) present revolutionary
opportunities to enhance our understanding of animal behavior and conservation
strategies. Using elephants, a crucial species in Africa's protected areas, as
our focal point, we delve into the role of AI and ML in their conservation.
Given the increasing amounts of data gathered from a variety of sensors like
cameras, microphones, geophones, drones, and satellites, the challenge lies in
managing and interpreting this vast data. New AI and ML techniques offer
solutions to streamline this process, helping us extract vital information that
might otherwise be overlooked. This paper focuses on the different AI-driven
monitoring methods and their potential for improving elephant conservation.
Collaborative efforts between AI experts and ecological researchers are
essential in leveraging these innovative technologies for enhanced wildlife
conservation, setting a precedent for numerous other species.
( 2
min )
We present a new data driven topological data analysis (TDA) approach for
estimating state spaces in dynamically changing human functional brain networks
of human. Our approach penalizes the topological distance between networks and
clusters dynamically changing brain networks into topologically distinct
states. Our method takes into account the temporal dimension of the data
through the Wasserstein distance between networks. Our method is shown to
outperform the widely used k-means clustering often used in estimating the
state space in brain networks. The method is applied to more accurately
determine the state spaces of dynamically changing functional brain networks.
Subsequently, we address the question of whether the overall topology of brain
networks is a heritable feature using the twin study design. MATLAB code for
the method is available at https://github.com/laplcebeltrami/PH-STAT.
( 2
min )
We propose a new homotopy-based conditional gradient method for solving
convex optimization problems with a large number of simple conic constraints.
Instances of this template naturally appear in semidefinite programming
problems arising as convex relaxations of combinatorial optimization problems.
Our method is a double-loop algorithm in which the conic constraint is treated
via a self-concordant barrier, and the inner loop employs a conditional
gradient algorithm to approximate the analytic central path, while the outer
loop updates the accuracy imposed on the temporal solution and the homotopy
parameter. Our theoretical iteration complexity is competitive when confronted
to state-of-the-art SDP solvers, with the decisive advantage of cheap
projection-free subroutines. Preliminary numerical experiments are provided for
illustrating the practical performance of the method.
( 2
min )
In this paper, we introduce a novel predict-and-optimize method for
profit-driven churn prevention. We frame the task of targeting customers for a
retention campaign as a regret minimization problem. The main objective is to
leverage individual customer lifetime values (CLVs) to ensure that only the
most valuable customers are targeted. In contrast, many profit-driven
strategies focus on churn probabilities while considering average CLVs. This
often results in significant information loss due to data aggregation. Our
proposed model aligns with the guidelines of Predict-and-Optimize (PnO)
frameworks and can be efficiently solved using stochastic gradient descent
methods. Results from 12 churn prediction datasets underscore the effectiveness
of our approach, which achieves the best average performance compared to other
well-established strategies in terms of average profit.
( 2
min )
Deep Neural Networks are prone to learning spurious correlations embedded in
the training data, leading to potentially biased predictions. This poses risks
when deploying these models for high-stake decision-making, such as in medical
applications. Current methods for post-hoc model correction either require
input-level annotations which are only possible for spatially localized biases,
or augment the latent feature space, thereby hoping to enforce the right
reasons. We present a novel method for model correction on the concept level
that explicitly reduces model sensitivity towards biases via gradient
penalization. When modeling biases via Concept Activation Vectors, we highlight
the importance of choosing robust directions, as traditional regression-based
approaches such as Support Vector Machines tend to result in diverging
directions. We effectively mitigate biases in controlled and real-world
settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet
and EfficientNet architectures. Code is available on
https://github.com/frederikpahde/rrclarc.
( 2
min )
We study the problem of learning causal representations from unknown, latent
interventions in a general setting, where the latent distribution is Gaussian
but the mixing function is completely general. We prove strong identifiability
results given unknown single-node interventions, i.e., without having access to
the intervention targets. This generalizes prior works which have focused on
weaker classes, such as linear maps or paired counterfactual data. This is also
the first instance of causal identifiability from non-paired interventions for
deep neural network embeddings. Our proof relies on carefully uncovering the
high-dimensional geometric structure present in the data distribution after a
non-linear density transformation, which we capture by analyzing quadratic
forms of precision matrices of the latent distributions. Finally, we propose a
contrastive algorithm to identify the latent variables in practice and evaluate
its performance on various tasks.
( 2
min )
Neuro-Symbolic (NeSy) predictive models hold the promise of improved
compliance with given constraints, systematic generalization, and
interpretability, as they allow to infer labels that are consistent with some
prior knowledge by reasoning over high-level concepts extracted from
sub-symbolic inputs. It was recently shown that NeSy predictors are affected by
reasoning shortcuts: they can attain high accuracy but by leveraging concepts
with unintended semantics, thus coming short of their promised advantages. Yet,
a systematic characterization of reasoning shortcuts and of potential
mitigation strategies is missing. This work fills this gap by characterizing
them as unintended optima of the learning objective and identifying four key
conditions behind their occurrence. Based on this, we derive several natural
mitigation strategies, and analyze their efficacy both theoretically and
empirically. Our analysis shows reasoning shortcuts are difficult to deal with,
casting doubts on the trustworthiness and interpretability of existing NeSy
solutions.
( 2
min )
Neoteric works have shown that modern deep learning models can exhibit a
sparse double descent phenomenon. Indeed, as the sparsity of the model
increases, the test performance first worsens since the model is overfitting
the training data; then, the overfitting reduces, leading to an improvement in
performance, and finally, the model begins to forget critical information,
resulting in underfitting. Such a behavior prevents using traditional early
stop criteria. In this work, we have three key contributions. First, we propose
a learning framework that avoids such a phenomenon and improves generalization.
Second, we introduce an entropy measure providing more insights into the
insurgence of this phenomenon and enabling the use of traditional stop
criteria. Third, we provide a comprehensive quantitative analysis of contingent
factors such as re-initialization methods, model width and depth, and dataset
noise. The contributions are supported by empirical evidence in typical setups.
Our code is available at https://github.com/VGCQ/DSD2.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of sparse or structured symmetric positive-definite
matrices with the affine-invariant metric. We do so by proposing a generalized
version of the Riemannian normal coordinates that dynamically orthonormalizes
the metric and locally converts the problem into an unconstrained problem in
the Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning with low precision by using only matrix
multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
( 2
min )
We present CrystalBox, a novel, model-agnostic, posthoc explainability
framework for Deep Reinforcement Learning (DRL) controllers in the large family
of input-driven environments which includes computer systems. We combine the
natural decomposability of reward functions in input-driven environments with
the explanatory power of decomposed returns. We propose an efficient algorithm
to generate future-based explanations across both discrete and continuous
control environments. Using applications such as adaptive bitrate streaming and
congestion control, we demonstrate CrystalBox's capability to generate
high-fidelity explanations. We further illustrate its higher utility across
three practical use cases: contrastive explanations, network observability, and
guided reward design, as opposed to prior explainability techniques that
identify salient features.
( 2
min )
We study simple binary hypothesis testing under both local differential
privacy (LDP) and communication constraints. We qualify our results as either
minimax optimal or instance optimal: the former hold for the set of
distribution pairs with prescribed Hellinger divergence and total variation
distance, whereas the latter hold for specific distribution pairs. For the
sample complexity of simple hypothesis testing under pure LDP constraints, we
establish instance-optimal bounds for distributions with binary support;
minimax-optimal bounds for general distributions; and (approximately)
instance-optimal, computationally efficient algorithms for general
distributions. When both privacy and communication constraints are present, we
develop instance-optimal, computationally efficient algorithms that achieve the
minimum possible sample complexity (up to universal constants). Our results on
instance-optimal algorithms hinge on identifying the extreme points of the
joint range set $\mathcal A$ of two distributions $p$ and $q$, defined as
$\mathcal A := \{(\mathbf T p, \mathbf T q) | \mathbf T \in \mathcal C\}$,
where $\mathcal C$ is the set of channels characterizing the constraints.
( 2
min )
Transfer learning (TL) from pretrained deep models is a standard practice in
modern medical image classification (MIC). However, what levels of features to
be reused are problem-dependent, and uniformly finetuning all layers of
pretrained models may be suboptimal. This insight has partly motivated the
recent differential TL strategies, such as TransFusion (TF) and layer-wise
finetuning (LWFT), which treat the layers in the pretrained models
differentially. In this paper, we add one more strategy into this family,
called TruncatedTL, which reuses and finetunes appropriate bottom layers and
directly discards the remaining layers. This yields not only superior MIC
performance but also compact models for efficient inference, compared to other
differential TL methods. Our code is available at:
https://github.com/sun-umn/TTL
( 2
min )
Modeling and synthesizing real sRGB noise is crucial for various low-level
vision tasks. The distribution of real sRGB noise is highly complex and
affected by a multitude of factors, making its accurate modeling extremely
challenging. Therefore, recent studies have proposed methods that employ
data-driven generative models, such as generative adversarial networks (GAN)
and Normalizing Flows. These studies achieve more accurate modeling of sRGB
noise compared to traditional noise modeling methods. However, there are
performance limitations due to the inherent characteristics of each generative
model. To address this issue, we propose NM-FlowGAN, a hybrid approach that
exploits the strengths of both GAN and Normalizing Flows. We simultaneously
employ a pixel-wise noise modeling network based on Normalizing Flows, and
spatial correlation modeling networks based on GAN. In our experiments, our
NM-FlowGAN outperforms other baselines on the sRGB noise synthesis task.
Moreover, the denoising neural network, trained with synthesized image pairs
from our model, also shows superior performance compared to other baselines.
Our code is available at: https://github.com/YoungJooHan/NM-FlowGAN
( 2
min )
Although promising, existing defenses against query-based attacks share a
common limitation: they offer increased robustness against attacks at the price
of a considerable accuracy drop on clean samples. In this work, we show how to
efficiently establish, at test-time, a solid tradeoff between robustness and
accuracy when mitigating query-based attacks. Given that these attacks
necessarily explore low-confidence regions, our insight is that activating
dedicated defenses, such as RND (Qin et al., NeuRIPS 2021) and Random Image
Transformations (Xie et al., ICLR 2018), only for low-confidence inputs is
sufficient to prevent them. Our approach is independent of training and
supported by theory. We verify the effectiveness of our approach for various
existing defenses by conducting extensive experiments on CIFAR-10, CIFAR-100,
and ImageNet. Our results confirm that our proposal can indeed enhance these
defenses by providing better tradeoffs between robustness and accuracy when
compared to state-of-the-art approaches while being completely training-free.
( 2
min )
We introduce a new technique called Drapes to enhance the sensitivity in
searches for new physics at the LHC. By training diffusion models on side-band
data, we show how background templates for the signal region can be generated
either directly from noise, or by partially applying the diffusion process to
existing data. In the partial diffusion case, data can be drawn from side-band
regions, with the inverse diffusion performed for new target conditional
values, or from the signal region, preserving the distribution over the
conditional property that defines the signal region. We apply this technique to
the hunt for resonances using the LHCO di-jet dataset, and achieve
state-of-the-art performance for background template generation using high
level input features. We also show how Drapes can be applied to low level
inputs with jet constituents, reducing the model dependence on the choice of
input observables. Using jet constituents we can further improve sensitivity to
the signal process, but observe a loss in performance where the signal
significance before applying any selection is below 4$\sigma$.
( 2
min )
Catastrophic forgetting remains a challenge for neural networks, especially
in lifelong learning scenarios. In this study, we introduce MEtaplasticity from
Synaptic Uncertainty (MESU), inspired by metaplasticity and Bayesian inference
principles. MESU harnesses synaptic uncertainty to retain information over
time, with its update rule closely approximating the diagonal Newton's method
for synaptic updates. Through continual learning experiments on permuted MNIST
tasks, we demonstrate MESU's remarkable capability to maintain learning
performance across 100 tasks without the need of explicit task boundaries.
( 2
min )
Applications of large language models (LLMs) like ChatGPT have potential to
enhance clinical decision support through conversational interfaces. However,
challenges of human-algorithmic interaction and clinician trust are poorly
understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction
and management guidance, was deployed in clinical simulation scenarios
alongside the electronic health record (EHR) with emergency medicine
physicians, internal medicine physicians, and medical students to evaluate its
effect on physician acceptance and trust in AI clinical decision support
systems (AI-CDSS). GutGPT provides risk predictions from a validated machine
learning model and evidence-based answers by querying extracted clinical
guidelines. Participants were randomized to GutGPT and an interactive
dashboard, or the interactive dashboard and a search engine. Surveys and
educational assessments taken before and after measured technology acceptance
and content mastery. Preliminary results showed mixed effects on acceptance
after using GutGPT compared to the dashboard or search engine but appeared to
improve content mastery based on simulation performance. Overall, this study
demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented
optimally and paired with interactive interfaces.
( 3
min )
A novel method, the Pareto Envelope Augmented with Reinforcement Learning
(PEARL), has been developed to address the challenges posed by multi-objective
problems, particularly in the field of engineering where the evaluation of
candidate solutions can be time-consuming. PEARL distinguishes itself from
traditional policy-based multi-objective Reinforcement Learning methods by
learning a single policy, eliminating the need for multiple neural networks to
independently solve simpler sub-problems. Several versions inspired from deep
learning and evolutionary techniques have been crafted, catering to both
unconstrained and constrained problem domains. Curriculum Learning is harnessed
to effectively manage constraints in these versions. PEARL's performance is
first evaluated on classical multi-objective benchmarks. Additionally, it is
tested on two practical PWR core Loading Pattern optimization problems to
showcase its real-world applicability. The first problem involves optimizing
the Cycle length and the rod-integrated peaking factor as the primary
objectives, while the second problem incorporates the mean average enrichment
as an additional objective. Furthermore, PEARL addresses three types of
constraints related to boron concentration, peak pin burnup, and peak pin
power. The results are systematically compared against a conventional approach,
the Non-dominated Sorting Genetic Algorithm. Notably, PEARL, specifically the
PEARL-NdS variant, efficiently uncovers a Pareto front without necessitating
additional efforts from the algorithm designer, as opposed to a single
optimization with scaled objectives. It also outperforms the classical approach
across multiple performance metrics, including the Hyper-volume.
( 3
min )
Car following (CF) models are fundamental to describing traffic dynamics.
However, the CF behavior of human drivers is highly stochastic and nonlinear.
As a result, identifying the best CF model has been challenging and
controversial despite decades of research. Introduction of automated vehicles
has further complicated this matter as their CF controllers remain proprietary,
though their behavior appears different than human drivers. This paper develops
a stochastic learning approach to integrate multiple CF models, rather than
relying on a single model. The framework is based on approximate Bayesian
computation that probabilistically concatenates a pool of CF models based on
their relative likelihood of describing observed behavior. The approach, while
data-driven, retains physical tractability and interpretability. Evaluation
results using two datasets show that the proposed approach can better reproduce
vehicle trajectories for both human driven and automated vehicles than any
single CF model considered.
( 2
min )
Riemannian submanifold optimization with momentum is computationally
challenging because, to ensure that the iterates remain on the submanifold, we
often need to solve difficult differential equations. Here, we simplify such
difficulties for a class of sparse or structured symmetric positive-definite
matrices with the affine-invariant metric. We do so by proposing a generalized
version of the Riemannian normal coordinates that dynamically orthonormalizes
the metric and locally converts the problem into an unconstrained problem in
the Euclidean space. We use our approach to simplify existing approaches for
structured covariances and develop matrix-inverse-free $2^\text{nd}$-order
optimizers for deep learning with low precision by using only matrix
multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
( 2
min )
Predictive algorithms are often trained by optimizing some loss function, to
which regularization functions are added to impose a penalty for violating
constraints. As expected, the addition of such regularization functions can
change the minimizer of the objective. It is not well-understood which
regularizers change the minimizer of the loss, and, when the minimizer does
change, how it changes. We use property elicitation to take first steps towards
understanding the joint relationship between the loss and regularization
functions and the optimal decision for a given problem instance. In particular,
we give a necessary and sufficient condition on loss and regularizer pairs for
when a property changes with the addition of the regularizer, and examine some
regularizers satisfying this condition standard in the fair machine learning
literature. We empirically demonstrate how algorithmic decision-making changes
as a function of both data distribution changes and hardness of the
constraints.
( 2
min )
Neuro-Symbolic (NeSy) predictive models hold the promise of improved
compliance with given constraints, systematic generalization, and
interpretability, as they allow to infer labels that are consistent with some
prior knowledge by reasoning over high-level concepts extracted from
sub-symbolic inputs. It was recently shown that NeSy predictors are affected by
reasoning shortcuts: they can attain high accuracy but by leveraging concepts
with unintended semantics, thus coming short of their promised advantages. Yet,
a systematic characterization of reasoning shortcuts and of potential
mitigation strategies is missing. This work fills this gap by characterizing
them as unintended optima of the learning objective and identifying four key
conditions behind their occurrence. Based on this, we derive several natural
mitigation strategies, and analyze their efficacy both theoretically and
empirically. Our analysis shows reasoning shortcuts are difficult to deal with,
casting doubts on the trustworthiness and interpretability of existing NeSy
solutions.
( 2
min )
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language
understanding over long texts, which contains only test and small validation
sets, without training data. We adapt six tasks from the SCROLLS benchmark, and
add four new datasets, including two novel information fusing tasks, such as
aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a
comprehensive evaluation of both open-source and closed large language models,
finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest
average score. However, there is still room for improvement on multiple open
challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to
pass the naive baseline. As the state of the art is a moving target, we invite
researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard.
( 2
min )
We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim.
28, 2018) to construct Lyapunov functions for optimization algorithms in
discrete and continuous time. For smooth, strongly convex objective functions,
we relax the requirements necessary for such a construction. As a result we are
able to prove for Polyak's ordinary differential equations and for a
two-parameter family of Nesterov algorithms rates of convergence that improve
on those available in the literature. We analyse the interpretation of Nesterov
algorithms as discretizations of the Polyak equation. We show that the
algorithms are instances of Additive Runge-Kutta integrators and discuss the
reasons why most discretizations of the differential equation do not result in
optimization algorithms with acceleration. We also introduce a modification of
Polyak's equation and study its convergence properties. Finally we extend the
general framework to the stochastic scenario and consider an application to
random algorithms with acceleration for overparameterized models; again we are
able to prove convergence rates that improve on those in the literature.
( 2
min )
In this paper, we provide a geometric interpretation of the structure of Deep
Learning (DL) networks, characterized by $L$ hidden layers, a ReLU ramp
activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt)
cost function, and input and output spaces $\mathbb{R}^Q$ with equal dimension
$Q\geq1$. The hidden layers are also defined on $\mathbb{R}^{Q}$; the training
input size $N$ can be arbitrarily large - thus, we are considering the
underparametrized regime. We apply our recent results on shallow neural
networks to construct an explicit family of minimizers for the global minimum
of the cost function in the case $L\geq Q$, which we show to be degenerate. In
the context presented here, the hidden layers of the DL network "curate" the
training inputs by recursive application of a truncation map that minimizes the
noise to signal ratio of the training inputs. Moreover, we determine a set of
$2^Q-1$ distinct degenerate local minima of the cost function. Our
constructions make no use of gradient descent algorithms at all.
( 3
min )
We study the problem of learning causal representations from unknown, latent
interventions in a general setting, where the latent distribution is Gaussian
but the mixing function is completely general. We prove strong identifiability
results given unknown single-node interventions, i.e., without having access to
the intervention targets. This generalizes prior works which have focused on
weaker classes, such as linear maps or paired counterfactual data. This is also
the first instance of causal identifiability from non-paired interventions for
deep neural network embeddings. Our proof relies on carefully uncovering the
high-dimensional geometric structure present in the data distribution after a
non-linear density transformation, which we capture by analyzing quadratic
forms of precision matrices of the latent distributions. Finally, we propose a
contrastive algorithm to identify the latent variables in practice and evaluate
its performance on various tasks.
( 2
min )
In this paper, we interpret disentanglement as the discovery of local charts
of the data manifold and trace how this definition naturally leads to an
equivalent condition for disentanglement: commutativity between factors of
variation. We study the impact of this manifold framework to two classes of
problems: learning matrix exponential operators and compressing data-generating
models. In each problem, the manifold perspective yields interesting results
about the feasibility and fruitful approaches their solutions. We also link our
manifold framework to two other common disentanglement paradigms: group
theoretic and probabilistic approaches to disentanglement. In each case, we
show how these frameworks can be merged with our manifold perspective.
Importantly, we recover commutativity as a central property in both alternative
frameworks, further highlighting its importance in disentanglement.
( 2
min )
We study simple binary hypothesis testing under both local differential
privacy (LDP) and communication constraints. We qualify our results as either
minimax optimal or instance optimal: the former hold for the set of
distribution pairs with prescribed Hellinger divergence and total variation
distance, whereas the latter hold for specific distribution pairs. For the
sample complexity of simple hypothesis testing under pure LDP constraints, we
establish instance-optimal bounds for distributions with binary support;
minimax-optimal bounds for general distributions; and (approximately)
instance-optimal, computationally efficient algorithms for general
distributions. When both privacy and communication constraints are present, we
develop instance-optimal, computationally efficient algorithms that achieve the
minimum possible sample complexity (up to universal constants). Our results on
instance-optimal algorithms hinge on identifying the extreme points of the
joint range set $\mathcal A$ of two distributions $p$ and $q$, defined as
$\mathcal A := \{(\mathbf T p, \mathbf T q) | \mathbf T \in \mathcal C\}$,
where $\mathcal C$ is the set of channels characterizing the constraints.
( 2
min )
In modern federated learning, one of the main challenges is to account for
inherent heterogeneity and the diverse nature of data distributions for
different clients. This problem is often addressed by introducing
personalization of the models towards the data distribution of the particular
client. However, a personalized model might be unreliable when applied to the
data that is not typical for this client. Eventually, it may perform worse for
these data than the non-personalized global model trained in a federated way on
the data from all the clients. This paper presents a new approach to federated
learning that allows selecting a model from global and personalized ones that
would perform better for a particular input point. It is achieved through a
careful modeling of predictive uncertainties that helps to detect local and
global in- and out-of-distribution data and use this information to select the
model that is confident in a prediction. The comprehensive experimental
evaluation on the popular real-world image datasets shows the superior
performance of the model in the presence of out-of-distribution data while
performing on par with state-of-the-art personalized federated learning
algorithms in the standard scenarios.
( 2
min )
In this paper, we explore the capability of both the Adjacency Spectral
Embedding (ASE) and the Graph Encoder Embedding (GEE) for capturing an embedded
pseudo-clique structure in the random dot product graph setting. In both theory
and experiments, we demonstrate that this pairing of model and methods can
yield worse results than the best existing spectral clique detection methods,
demonstrating at once the methods' potential inability to capture even modestly
sized pseudo-cliques and the methods' robustness to the model contamination
giving rise to the pseudo-clique structure. To further enrich our analysis, we
also consider the Variational Graph Auto-Encoder (VGAE) model in our simulation
and real data experiments.
( 2
min )
Block majorization-minimization (BMM) is a simple iterative algorithm for
nonconvex optimization that sequentially minimizes a majorizing surrogate of
the objective function in each block coordinate while the other block
coordinates are held fixed. We consider a family of BMM algorithms for
minimizing smooth nonconvex objectives, where each parameter block is
constrained within a subset of a Riemannian manifold. We establish that this
algorithm converges asymptotically to the set of stationary points, and attains
an $\epsilon$-stationary point within $\widetilde{O}(\epsilon^{-2})$
iterations. In particular, the assumptions for our complexity results are
completely Euclidean when the underlying manifold is a product of Euclidean or
Stiefel manifolds, although our analysis makes explicit use of the Riemannian
geometry. Our general analysis applies to a wide range of algorithms with
Riemannian constraints: Riemannian MM, block projected gradient descent,
optimistic likelihood estimation, geodesically constrained subspace tracking,
robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate
that our algorithm converges faster than standard Euclidean algorithms applied
to the Riemannian setting.
( 2
min )
Neural networks are powerful tools in various applications, and quantifying
their uncertainty is crucial for reliable decision-making. In the deep learning
field, the uncertainties are usually categorized into aleatoric (data) and
epistemic (model) uncertainty. In this paper, we point out that the existing
popular variance attenuation method highly overestimates aleatoric uncertainty.
To address this issue, we propose a new estimation method by actively
de-noising the observed data \footnote{Source code available at
\url{https://github.com/wz16/DVA}.}. By conducting a broad range of
experiments, we demonstrate that our proposed approach provides a much closer
approximation to the actual data uncertainty than the standard method.
( 2
min )
Current deep learning algorithms designed for automatic ECG analysis have
exhibited notable accuracy. However, akin to traditional electrocardiography,
they tend to be narrowly focused and typically address a singular diagnostic
condition. In this study, we specifically demonstrate the capability of a
single model to predict a diverse range of both cardiac and non-cardiac
discharge diagnoses based on a sole ECG collected in the emergency department.
Among the 1,076 hierarchically structured ICD codes considered, our model
achieves an AUROC exceeding 0.8 in 439 of them. This underscores the models
proficiency in handling a wide array of diagnostic scenarios. We emphasize the
potential of utilizing this model as a screening tool, potentially integrated
into a holistic clinical decision support system for efficiently triaging
patients in the emergency department. This research underscores the remarkable
capabilities of comprehensive ECG analysis algorithms and the extensive range
of possibilities facilitated by the open MIMIC-IV-ECG dataset. Finally, our
data may play a pivotal role in revolutionizing the way ECG analysis is
performed, marking a significant advancement in the field.
( 2
min )
The applications of traditional statistical feature selection methods to
high-dimension, low sample-size data often struggle and encounter challenging
problems, such as overfitting, curse of dimensionality, computational
infeasibility, and strong model assumption. In this paper, we propose a novel
two-step nonparametric approach called Deep Feature Screening (DeepFS) that can
overcome these problems and identify significant features with high precision
for ultra high-dimensional, low-sample-size data. This approach first extracts
a low-dimensional representation of input data and then applies feature
screening based on multivariate rank distance correlation recently developed by
Deb and Sen (2021). This approach combines the strengths of both deep neural
networks and feature screening, and thereby has the following appealing
features in addition to its ability of handling ultra high-dimensional data
with small number of samples: (1) it is model free and distribution free; (2)
it can be used for both supervised and unsupervised feature selection; and (3)
it is capable of recovering the original input data. The superiority of DeepFS
is demonstrated via extensive simulation studies and real data analyses.
( 2
min )
Random Forest is a machine learning method that offers many advantages,
including the ability to easily measure variable importance. Class balancing
technique is a well-known solution to deal with class imbalance problem.
However, it has not been actively studied on RF variable importance. In this
paper, we study the effect of class balancing on RF variable importance. Our
simulation results show that over-sampling is effective in correctly measuring
variable importance in class imbalanced situations with small sample size,
while under-sampling fails to differentiate important and non-informative
variables. We then propose a variable selection algorithm that utilizes RF
variable importance and its confidence interval. Through an experimental study
using many real and artificial datasets, we demonstrate that our proposed
algorithm efficiently selects an optimal feature set, leading to improved
prediction performance in class imbalance problem.
( 2
min )
We study hypothesis testing under communication constraints, where each
sample is quantized before being revealed to a statistician. Without
communication constraints, it is well known that the sample complexity of
simple binary hypothesis testing is characterized by the Hellinger distance
between the distributions. We show that the sample complexity of simple binary
hypothesis testing under communication constraints is at most a logarithmic
factor larger than in the unconstrained setting and this bound is tight. We
develop a polynomial-time algorithm that achieves the aforementioned sample
complexity. Our framework extends to robust hypothesis testing, where the
distributions are corrupted in the total variation distance. Our proofs rely on
a new reverse data processing inequality and a reverse Markov inequality, which
may be of independent interest. For simple $M$-ary hypothesis testing, the
sample complexity in the absence of communication constraints has a logarithmic
dependence on $M$. We show that communication constraints can cause an
exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive
algorithms.
( 2
min )
We consider the problem of inferring latent stochastic differential equations
(SDEs) with a time and memory cost that scales independently with the amount of
data, the total length of the time series, and the stiffness of the approximate
differential equations. This is in stark contrast to typical methods for
inferring latent differential equations which, despite their constant memory
cost, have a time complexity that is heavily dependent on the stiffness of the
approximate differential equation. We achieve this computational advancement by
removing the need to solve differential equations when approximating gradients
using a novel amortization strategy coupled with a recently derived
reparametrization of expectations under linear SDEs. We show that, in practice,
this allows us to achieve similar performance to methods based on adjoint
sensitivities with more than an order of magnitude fewer evaluations of the
model in training.
( 2
min )
This paper studies the theoretical framework of the alignment process of
generative models with Reinforcement Learning from Human Feedback (RLHF). We
consider a standard mathematical formulation, the reverse-KL regularized
contextual bandit for RLHF. Despite its widespread practical application, a
rigorous theoretical analysis of this formulation remains open. We investigate
its theoretical properties both in offline and online settings and propose
efficient algorithms with finite-sample theoretical guarantees. Our work
bridges the gap between theory and practice by linking our theoretical insights
with existing practical alignment algorithms such as Direct Preference
Optimization (DPO) and Rejection Sampling Optimization (RSO). Furthermore,
these findings and connections also offer both theoretical and practical
communities new tools and insights for future algorithmic design of alignment
algorithms.
( 2
min )
We derive a concentration bound of the type `for all $n \geq n_0$ for some
$n_0$' for TD(0) with linear function approximation. We work with online TD
learning with samples from a single sample path of the underlying Markov chain.
This makes our analysis significantly different from offline TD learning or TD
learning with access to independent samples from the stationary distribution of
the Markov chain. We treat TD(0) as a contractive stochastic approximation
algorithm, with both martingale and Markov noises. Markov noise is handled
using the Poisson equation and the lack of almost sure guarantees on
boundedness of iterates is handled using the concept of relaxed concentration
inequalities.
( 2
min )
In the lead-up to next month’s CES trade show in Las Vegas, NVIDIA will unveil its latest advancements in artificial intelligence — including generative AI — and a spectrum of other cutting-edge technologies. Scheduled for Monday, Jan. 8, at 8 a.m. PT, the company’s special address will be publicly streamed. Save the date and plan Read article >
( 5
min )
NVIDIA DLSS 3.5 for realistic ray-traced visuals is now available on D5 Render, a real-time 3D creation software.
( 7
min )
This post was written in collaboration with Ankur Goyal and Karthikeyan Chokappa from PwC Australia’s Cloud & Digital business. Artificial intelligence (AI) and machine learning (ML) are becoming an integral part of systems and processes, enabling decisions in real time, thereby driving top and bottom-line improvements across organizations. However, putting an ML model into production […]
( 10
min )
Dementia diagnosis requires a series of different testing methods, which is
complex and time-consuming. Early detection of dementia is crucial as it can
prevent further deterioration of the condition. This paper utilizes a speech
recognition model to construct a dementia assessment system tailored for
Mandarin speakers during the picture description task. By training an
attention-based speech recognition model on voice data closely resembling
real-world scenarios, we have significantly enhanced the model's recognition
capabilities. Subsequently, we extracted the encoder from the speech
recognition model and added a linear layer for dementia assessment. We
collected Mandarin speech data from 99 subjects and acquired their clinical
assessments from a local hospital. We achieved an accuracy of 92.04% in
Alzheimer's disease detection and a mean absolute error of 9% in clinical
dementia rating score prediction.
( 2
min )
One of the challenges in deploying a machine learning model is that the
model's performance degrades as the operating environment changes. To maintain
the performance, streaming active learning is used, in which the model is
retrained by adding a newly annotated sample to the training dataset if the
prediction of the sample is not certain enough. Although many streaming active
learning methods have been proposed for classification, few efforts have been
made for regression problems, which are often handled in the industrial field.
In this paper, we propose to use the regression-via-classification framework
for streaming active learning for regression. Regression-via-classification
transforms regression problems into classification problems so that streaming
active learning methods proposed for classification problems can be applied
directly to regression problems. Experimental validation on four real data sets
shows that the proposed method can perform regression with higher accuracy at
the same annotation cost.
( 2
min )
A common approach to learning mobile health (mHealth) intervention policies
is linear Thompson sampling. Two desirable mHealth policy features are (1)
pooling information across individuals and time and (2) incorporating a
time-varying baseline reward. Previous approaches pooled information across
individuals but not time, failing to capture trends in treatment effects over
time. In addition, these approaches did not explicitly model the baseline
reward, which limited the ability to precisely estimate the parameters in the
differential reward model. In this paper, we propose a novel Thompson sampling
algorithm, termed ''DML-TS-NNR'' that leverages (1) nearest-neighbors to
efficiently pool information on the differential reward function across users
and time and (2) the Double Machine Learning (DML) framework to explicitly
model baseline rewards and stay agnostic to the supervised learning algorithms
used. By explicitly modeling baseline rewards, we obtain smaller confidence
sets for the differential reward parameters. We offer theoretical guarantees on
the pseudo-regret, which are supported by empirical results. Importantly, the
DML-TS-NNR algorithm demonstrates robustness to potential misspecifications in
the baseline reward model.
( 2
min )
The recognition of abstracts is crucial for effectively locating the content
and clarifying the article. Existing move recognition algorithms lack the
ability to learn word position information to obtain contextual semantics. This
paper proposes a novel enhanced move recognition algorithm with an improved
pre-trained model and a gated network with attention mechanism for unstructured
abstracts of Chinese scientific and technological papers. The proposed
algorithm first performs summary data segmentation and vocabulary training. The
EP-ERNIE$\_$AT-GRU framework is leveraged to incorporate word positional
information, facilitating deep semantic learning and targeted feature
extraction. Experimental results demonstrate that the proposed algorithm
achieves 13.37$\%$ higher accuracy on the split dataset than on the original
dataset and a 7.55$\%$ improvement in accuracy over the basic comparison model.
( 2
min )
While federated learning is promising for privacy-preserving collaborative
learning without revealing local data, it remains vulnerable to white-box
attacks and struggles to adapt to heterogeneous clients. Federated distillation
(FD), built upon knowledge distillation--an effective technique for
transferring knowledge from a teacher model to student models--emerges as an
alternative paradigm, which provides enhanced privacy guarantees and addresses
model heterogeneity. Nevertheless, challenges arise due to variations in local
data distributions and the absence of a well-trained teacher model, which leads
to misleading and ambiguous knowledge sharing that significantly degrades model
performance. To address these issues, this paper proposes a selective knowledge
sharing mechanism for FD, termed Selective-FD. It includes client-side
selectors and a server-side selector to accurately and precisely identify
knowledge from local and ensemble predictions, respectively. Empirical studies,
backed by theoretical insights, demonstrate that our approach enhances the
generalization capabilities of the FD framework and consistently outperforms
baseline methods.
( 2
min )
The influx of massive amounts of data from current and upcoming cosmological
surveys necessitates compression schemes that can efficiently summarize the
data with minimal loss of information. We introduce a method that leverages the
paradigm of self-supervised machine learning in a novel manner to construct
representative summaries of massive datasets using simulation-based
augmentations. Deploying the method on hydrodynamical cosmological simulations,
we show that it can deliver highly informative summaries, which can be used for
a variety of downstream tasks, including precise and accurate parameter
inference. We demonstrate how this paradigm can be used to construct summary
representations that are insensitive to prescribed systematic effects, such as
the influence of baryonic physics. Our results indicate that self-supervised
machine learning techniques offer a promising new approach for compression of
cosmological data as well its analysis.
( 2
min )
Many functions characterising physical systems are additively separable. This
is the case, for instance, of mechanical Hamiltonian functions in physics,
population growth equations in biology, and consumer preference and utility
functions in economics. We consider the scenario in which a surrogate of a
function is to be tested for additive separability. The detection that the
surrogate is additively separable can be leveraged to improve further learning.
Hence, it is beneficial to have the ability to test for such separability in
surrogates. The mathematical approach is to test if the mixed partial
derivative of the surrogate is zero; or empirically, lower than a threshold. We
present and comparatively and empirically evaluate the eight methods to compute
the mixed partial derivative of a surrogate function.
( 2
min )
While coresets have been growing in terms of their application, barring few
exceptions, they have mostly been limited to unsupervised settings. We consider
supervised classification problems, and non-decomposable evaluation measures in
such settings. We show that stratified uniform sampling based coresets have
excellent empirical performance that are backed by theoretical guarantees too.
We focus on the F1 score and Matthews Correlation Coefficient, two widely used
non-decomposable objective functions that are nontrivial to optimize for and
show that uniform coresets attain a lower bound for coreset size, and have good
empirical performance, comparable with ``smarter'' coreset construction
strategies.
( 2
min )
Theoretical guarantees in reinforcement learning (RL) are known to suffer
multiplicative blow-up factors with respect to the misspecification error of
function approximation. Yet, the nature of such \emph{approximation factors} --
especially their optimal form in a given learning problem -- is poorly
understood. In this paper we study this question in linear off-policy value
function estimation, where many open questions remain. We study the
approximation factor in a broad spectrum of settings, such as with the weighted
$L_2$-norm (where the weighting is the offline state distribution), the
$L_\infty$ norm, the presence vs. absence of state aliasing, and full vs.
partial coverage of the state space. We establish the optimal asymptotic
approximation factors (up to constants) for all of these settings. In
particular, our bounds identify two instance-dependent factors for the
$L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to
dictate the hardness of off-policy evaluation under misspecification.
( 2
min )
High-resolution image generation with Generative Artificial Intelligence
(GenAI) has immense potential but, due to the enormous capital investment
required for training, it is increasingly centralised to a few large
corporations, and hidden behind paywalls. This paper aims to democratise
high-resolution GenAI by advancing the frontier of high-resolution generation
while remaining accessible to a broad audience. We demonstrate that existing
Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution
image generation. Our novel DemoFusion framework seamlessly extends open-source
GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated
Sampling mechanisms to achieve higher-resolution image generation. The
progressive nature of DemoFusion requires more passes, but the intermediate
results can serve as "previews", facilitating rapid prompt iteration.
( 2
min )
We study monotone submodular maximization under general matroid constraints
in the online setting. We prove that online optimization of a large class of
submodular functions, namely, weighted threshold potential functions, reduces
to online convex optimization (OCO). This is precisely because functions in
this class admit a concave relaxation; as a result, OCO policies, coupled with
an appropriate rounding scheme, can be used to achieve sublinear regret in the
combinatorial setting. We show that our reduction extends to many different
versions of the online learning problem, including the dynamic regret, bandit,
and optimistic-learning settings.
( 2
min )
The aim of this paper is to provide a theoretically founded investigation of
state-of-the-art learning approaches for inverse problems. We give an extended
definition of regularization methods and their convergence in terms of the
underlying data distributions, which paves the way for future theoretical
studies. Based on a simple spectral learning model previously introduced for
supervised learning, we investigate some key properties of different learning
paradigms for inverse problems, which can be formulated independently of
specific architectures. In particular we investigate the regularization
properties, bias, and critical dependence on training data distributions.
Moreover, our framework allows to highlight and compare the specific behavior
of the different paradigms in the infinite-dimensional limit.
( 2
min )
In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows
method to final states containing multiple neutrinos. The architecture can
natively scale for all combinations of object types and multiplicities in the
final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton
events, the momenta of both neutrinos and correlations between them are
reconstructed more accurately than when using the most popular standard
analytical techniques, and solutions are found for all events. Inference time
is significantly faster than competing methods, and can be reduced further by
evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to
$t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded
distributions is much closer to the limit of performance set by perfect
neutrino reconstruction than standard techniques. For the chosen double
differential observables $\nu^2$-Flows results in improved statistical
precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino
Weighting method and up to a factor of four in comparison to the Ellipse
approach.
( 3
min )
The community explored to build private inference frameworks for
transformer-based large language models (LLMs) in a server-client setting,
where the server holds the model parameters and the client inputs its private
data (or prompt) for inference. However, these frameworks impose significant
overhead when the private inputs are forward propagated through the original
LLMs. In this paper, we show that substituting the computation- and
communication-heavy operators in the transformer architecture with
privacy-computing friendly approximations can greatly reduce the private
inference costs while incurring very minor impact on model performance.
Compared to state-of-the-art Iron (NeurIPS 2022), our privacy-computing
friendly model inference pipeline achieves a $5\times$ acceleration in
computation and an 80% reduction in communication overhead, while retaining
nearly identical accuracy.
( 2
min )
In the field of clinical medicine, computed tomography (CT) is an effective
medical imaging modality for the diagnosis of various pathologies. Compared
with X-ray images, CT images can provide more information, including
multi-planar slices and three-dimensional structures for clinical diagnosis.
However, CT imaging requires patients to be exposed to large doses of ionizing
radiation for a long time, which may cause irreversible physical harm. In this
paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on
generated radiation fields. The network can learn a continuous representation
of CT projections from 2D X-ray images by obtaining the internal structure and
depth information and using adaptive loss weights to ensure the quality of the
generated images. Our model is trained on publicly available knee and chest
datasets, and we show the results of CT projection rendering with a single
X-ray and compare our method with other methods based on generated radiation
fields.
( 2
min )
Biomedical entity linking (BioEL) has achieved remarkable progress with the
help of pre-trained language models. However, existing BioEL methods usually
struggle to handle rare and difficult entities due to long-tailed distribution.
To address this limitation, we introduce a new scheme $k$NN-BioEL, which
provides a BioEL model with the ability to reference similar instances from the
entire training corpus as clues for prediction, thus improving the
generalization capabilities. Moreover, we design a contrastive learning
objective with dynamic hard negative sampling (DHNS) that improves the quality
of the retrieved neighbors during inference. Extensive experimental results
show that $k$NN-BioEL outperforms state-of-the-art baselines on several
datasets.
( 2
min )
We present a deep Graph Convolutional Kernel Machine (GCKM) for
semi-supervised node classification in graphs. The method is built of two main
types of blocks: (i) We introduce unsupervised kernel machine layers
propagating the node features in a one-hop neighborhood, using implicit node
feature mappings. (ii) We specify a semi-supervised classification kernel
machine through the lens of the Fenchel-Young inequality. We derive an
effective initialization scheme and efficient end-to-end training algorithm in
the dual variables for the full architecture. The main idea underlying GCKM is
that, because of the unsupervised core, the final model can achieve higher
performance in semi-supervised node classification when few labels are
available for training. Experimental results demonstrate the effectiveness of
the proposed framework.
( 2
min )
Inverse reinforcement learning (IRL) is computationally challenging, with
common approaches requiring the solution of multiple reinforcement learning
(RL) sub-problems. This work motivates the use of potential-based reward
shaping to reduce the computational burden of each RL sub-problem. This work
serves as a proof-of-concept and we hope will inspire future developments
towards computationally efficient IRL.
( 2
min )
The promise of Mobile Health (mHealth) is the ability to use wearable sensors
to monitor participant physiology at high frequencies during daily life to
enable temporally-precise health interventions. However, a major challenge is
frequent missing data. Despite a rich imputation literature, existing
techniques are ineffective for the pulsative signals which comprise many
mHealth applications, and a lack of available datasets has stymied progress. We
address this gap with PulseImpute, the first large-scale pulsative signal
imputation challenge which includes realistic mHealth missingness models, an
extensive set of baselines, and clinically-relevant downstream tasks. Our
baseline models include a novel transformer-based architecture designed to
exploit the structure of pulsative signals. We hope that PulseImpute will
enable the ML community to tackle this significant and challenging task.
( 2
min )
Can a machine or algorithm discover or learn Kepler's first law from
astronomical sightings alone? We emulate Johannes Kepler's discovery of the
equation of the orbit of Mars with the Rudolphine tables using AI Feynman, a
physics-inspired tool for symbolic regression.
( 2
min )
Exact Bayesian inference on state-space models~(SSM) is in general
untractable, and unfortunately, basic Sequential Monte Carlo~(SMC) methods do
not yield correct approximations for complex models. In this paper, we propose
a mixed inference algorithm that computes closed-form solutions using belief
propagation as much as possible, and falls back to sampling-based SMC methods
when exact computations fail. This algorithm thus implements automatic
Rao-Blackwellization and is even exact for Gaussian tree models.
( 2
min )
Policy learning in robot-assisted surgery (RAS) lacks data efficient and
versatile methods that exhibit the desired motion quality for delicate surgical
interventions. To this end, we introduce Movement Primitive Diffusion (MPD), a
novel method for imitation learning (IL) in RAS that focuses on gentle
manipulation of deformable objects. The approach combines the versatility of
diffusion-based imitation learning (DIL) with the high-quality motion
generation capabilities of Probabilistic Dynamic Movement Primitives (ProDMPs).
This combination enables MPD to achieve gentle manipulation of deformable
objects, while maintaining data efficiency critical for RAS applications where
demonstration data is scarce. We evaluate MPD across various simulated tasks
and a real world robotic setup on both state and image observations. MPD
outperforms state-of-the-art DIL methods in success rate, motion quality, and
data efficiency.
( 2
min )
Venn Prediction (VP) is a new machine learning framework for producing
well-calibrated probabilistic predictions. In particular it provides
well-calibrated lower and upper bounds for the conditional probability of an
example belonging to each possible class of the problem at hand. This paper
proposes five VP methods based on Neural Networks (NNs), which is one of the
most widely used machine learning techniques. The proposed methods are
evaluated experimentally on four benchmark datasets and the obtained results
demonstrate the empirical well-calibratedness of their outputs and their
superiority over the outputs of the traditional NN classifier.
( 2
min )
Artificial Intelligence (AI) based image analysis has an immense potential to
support diagnostic histopathology, including cancer diagnostics. However,
developing supervised AI methods requires large-scale annotated datasets. A
potentially powerful solution is to augment training data with synthetic data.
Latent diffusion models, which can generate high-quality, diverse synthetic
images, are promising. However, the most common implementations rely on
detailed textual descriptions, which are not generally available in this
domain. This work proposes a method that constructs structured textual prompts
from automatically extracted image features. We experiment with the PCam
dataset, composed of tissue patches only loosely annotated as healthy or
cancerous. We show that including image-derived features in the prompt, as
opposed to only healthy and cancerous labels, improves the Fr\'echet Inception
Distance (FID) from 178.8 to 90.2. We also show that pathologists find it
challenging to detect synthetic images, with a median sensitivity/specificity
of 0.55/0.55. Finally, we show that synthetic data effectively trains AI
models.
( 3
min )
Offline reinforcement learning leverages pre-collected datasets of
transitions to train policies. It can serve as effective initialization for
online algorithms, enhancing sample efficiency and speeding up convergence.
However, when such datasets are limited in size and quality, offline
pre-training can produce sub-optimal policies and lead to degraded online
reinforcement learning performance. In this paper we propose a model-based data
augmentation strategy to maximize the benefits of offline reinforcement
learning pre-training and reduce the scale of data needed to be effective. Our
approach leverages a world model of the environment trained on the offline
dataset to augment states during offline pre-training. We evaluate our approach
on a variety of MuJoCo robotic tasks and our results show it can jump-start
online fine-tuning and substantially reduce - in some cases by an order of
magnitude - the required number of environment interactions.
( 2
min )
This paper studies the problem of CPRP, concept prerequisite relation
prediction, which is a fundamental task in using AI for education. CPRP is
usually formulated into a link-prediction task on a relationship graph of
concepts and solved by training the graph neural network (GNN) model. However,
current directed GNNs fail to manage graph isomorphism which refers to the
invariance of non-isomorphic graphs, reducing the expressivity of resulting
representations. We present a permutation-equivariant directed GNN model by
introducing the Weisfeiler-Lehman test into directed GNN learning. Our method
is then used for CPRP and evaluated on three public datasets. The experimental
results show that our model delivers better prediction performance than the
state-of-the-art methods.
( 2
min )
In this paper we propose a new method for training neural networks (NNs) for
frequency modulated continuous wave (FMCW) radar mutual interference
mitigation. Instead of training NNs to regress from interfered to clean radar
signals as in previous work, we train NNs directly on object detection maps. We
do so by performing a continuous relaxation of the cell-averaging constant
false alarm rate (CA-CFAR) peak detector, which is a well-established algorithm
for object detection using radar. With this new training objective we are able
to increase object detection performance by a large margin. Furthermore, we
introduce separable convolution kernels to strongly reduce the number of
parameters and computational complexity of convolutional NN architectures for
radar applications. We validate our contributions with experiments on
real-world measurement data and compare them against signal processing
interference mitigation methods.
( 2
min )
This paper presents a method for learning Hamiltonian dynamics from a limited
set of data points. The Hamiltonian vector field is found by regularized
optimization over a reproducing kernel Hilbert space of vector fields that are
inherently Hamiltonian, and where the vector field is required to be odd or
even. This is done with a symplectic kernel, and it is shown how this
symplectic kernel can be modified to be odd or even. The performance of the
method is validated in simulations for two Hamiltonian systems. It is shown
that the learned dynamics are Hamiltonian, and that the learned Hamiltonian
vector field can be prescribed to be odd or even.
( 2
min )
Congenital heart disease (CHD) is a relatively rare disease that affects
patients at birth and results in extremely heterogeneous anatomical and
functional defects. 12-lead ECG signal is routinely collected in CHD patients
because it provides significant biomarkers for disease prognosis. However,
developing accurate machine learning models is challenging due to the lack of
large available datasets. Here, we suggest exploiting the Riemannian geometry
of the spatial covariance structure of the ECG signal to improve
classification. Firstly, we use covariance augmentation to mix samples across
the Riemannian geodesic between corresponding classes. Secondly, we suggest to
project the covariance matrices to their respective class Riemannian mean to
enhance the quality of feature extraction via tangent space projection. We
perform several ablation experiments and demonstrate significant improvement
compared to traditional machine learning models and deep learning on ECG time
series data.
( 2
min )
Despite being a unique source of information on patients' status and disease
progression, clinical notes are characterized by high levels of duplication and
information redundancy. In general domain text, it has been shown that
deduplication does not harm language model (LM) pretraining, thus helping
reduce the training cost. Although large LMs have proven to learn medical
knowledge, they still require specialized domain adaptation for improved
downstream clinical tasks. By leveraging large real-world clinical corpora, we
first provided a fine-grained characterization of duplicates stemming from
common writing practices and clinical relevancy. Second, we demonstrated that
deduplicating clinical text can help clinical LMs encode less redundant
information in a more efficient manner and do not harm classification tasks via
prompt-based learning.
( 2
min )
Binary code summarization, while invaluable for understanding code semantics,
is challenging due to its labor-intensive nature. This study delves into the
potential of large language models (LLMs) for binary code comprehension. To
this end, we present BinSum, a comprehensive benchmark and dataset of over 557K
binary functions and introduce a novel method for prompt synthesis and
optimization. To more accurately gauge LLM performance, we also propose a new
semantic similarity metric that surpasses traditional exact-match approaches.
Our extensive evaluation of prominent LLMs, including ChatGPT, GPT-4, Llama 2,
and Code Llama, reveals 10 pivotal insights. This evaluation generates 4
billion inference tokens, incurred a total expense of 11,418 US dollars and 873
NVIDIA A100 GPU hours. Our findings highlight both the transformative potential
of LLMs in this field and the challenges yet to be overcome.
( 2
min )
Despite the remarkable advances in deep learning technology, achieving
satisfactory performance in lung sound classification remains a challenge due
to the scarcity of available data. Moreover, the respiratory sound samples are
collected from a variety of electronic stethoscopes, which could potentially
introduce biases into the trained models. When a significant distribution shift
occurs within the test dataset or in a practical scenario, it can substantially
decrease the performance. To tackle this issue, we introduce cross-domain
adaptation techniques, which transfer the knowledge from a source domain to a
distinct target domain. In particular, by considering different stethoscope
types as individual domains, we propose a novel stethoscope-guided supervised
contrastive learning approach. This method can mitigate any domain-related
disparities and thus enables the model to distinguish respiratory sounds of the
recording variation of the stethoscope. The experimental results on the ICBHI
dataset demonstrate that the proposed methods are effective in reducing the
domain dependency and achieving the ICBHI Score of 61.71%, which is a
significant improvement of 2.16% over the baseline.
( 2
min )
Our study focuses on the potential for modifications of Inception-like
architecture within the electrocardiogram (ECG) domain. To this end, we
introduce IncepSE, a novel network characterized by strategic architectural
incorporation that leverages the strengths of both InceptionTime and channel
attention mechanisms. Furthermore, we propose a training setup that employs
stabilization techniques that are aimed at tackling the formidable challenges
of severe imbalance dataset PTB-XL and gradient corruption. By this means, we
manage to set a new height for deep learning model in a supervised learning
manner across the majority of tasks. Our model consistently surpasses
InceptionTime by substantial margins compared to other state-of-the-arts in
this domain, noticeably 0.013 AUROC score improvement in the "all" task, while
also mitigating the inherent dataset fluctuations during training.
( 2
min )
$B_1^+$ and $B_0$ field-inhomogeneities can significantly reduce accuracy and
robustness of MRF's quantitative parameter estimates. Additional $B_1^+$ and
$B_0$ calibration scans can mitigate this but add scan time and cannot be
applied retrospectively to previously collected data. Here, we proposed a
calibration-free sequence-adaptive deep-learning framework, to estimate and
correct for $B_1^+$ and $B_0$ effects of any MRF sequence. We demonstrate its
capability on arbitrary MRF sequences at 3T, where no training data were
previously obtained. Such approach can be applied to any previously-acquired
and future MRF-scans. The flexibility in directly applying this framework to
other quantitative sequences is also highlighted.
( 2
min )
Uncertainty Quantification (UQ) has gained traction in an attempt to fix the
black-box nature of Deep Learning. Specifically (medical) biosignals such as
electroencephalography (EEG), electrocardiography (ECG), electroocculography
(EOG) and electromyography (EMG) could benefit from good UQ, since these suffer
from a poor signal to noise ratio, and good human interpretability is pivotal
for medical applications and Brain Computer Interfaces. In this paper, we
review the state of the art at the intersection of Uncertainty Quantification
and Biosignal with Machine Learning. We present various methods, shortcomings,
uncertainty measures and theoretical frameworks that currently exist in this
application domain. Overall it can be concluded that promising UQ methods are
available, but that research is needed on how people and systems may interact
with an uncertainty model in a (clinical) environment.
( 2
min )
In this study, we propose an approach for predicting rare events by
exploiting time series in coevolution. Our approach involves a weighted
autologistic regression model, where we leverage the temporal behavior of the
data to enhance predictive capabilities. By addressing the issue of imbalanced
datasets, we establish constraints leading to weight estimation and to improved
performance. Evaluation on synthetic and real-world datasets confirms that our
approach outperform state-of-the-art of predicting home equipment failure
methods.
( 2
min )
This study introduces an innovative 3D printed dry electrode tailored for
biosensing in postoperative recovery scenarios. Fabricated through a drop
coating process, the electrode incorporates a novel 2D material.
( 2
min )
Biased enhanced sampling methods utilizing collective variables (CVs) are
powerful tools for sampling conformational ensembles. Due to high intrinsic
dimensions, efficiently generating conformational ensembles for complex systems
requires enhanced sampling on high-dimensional free energy surfaces. While
methods like temperature-accelerated molecular dynamics (TAMD) can adopt many
CVs in a simulation, unbiasing the simulation requires accurate modeling of a
high-dimensional CV probability distribution, which is challenging for
traditional density estimation techniques. Here we propose an unbiasing method
based on the score-based diffusion model, a deep generative learning method
that excels in density estimation across complex data landscapes. We test the
score-based diffusion unbiasing method on TAMD simulations. The results
demonstrate that this unbiasing approach significantly outperforms traditional
unbiasing methods, and can generate accurate unbiased conformational ensembles
for simulations with a number of CVs higher than usual ranges.
( 2
min )
Catastrophic forgetting(CF) is a significant challenge in continual learning
(CL). In regularization-based approaches to mitigate CF, modifications to
important training parameters are penalized in subsequent tasks using an
appropriate loss function. We propose the RTRA, a modification to the widely
used Elastic Weight Consolidation (EWC) regularization scheme, using the
Natural Gradient for loss function optimization. Our approach improves the
training of regularization-based methods without sacrificing test-data
performance. We compare the proposed RTRA approach against EWC using the
iFood251 dataset. We show that RTRA has a clear edge over the state-of-the-art
approaches.
( 2
min )
Rehearsal-based techniques are commonly used to mitigate catastrophic
forgetting (CF) in Incremental learning (IL). The quality of the exemplars
selected is important for this purpose and most methods do not ensure the
appropriate diversity of the selected exemplars. We propose a new technique
"DSS" -- Diverse Selection of Samples from the input data stream in the
Class-incremental learning (CIL) setup under both disjoint and fuzzy task
boundary scenarios. Our method outperforms state-of-the-art methods and is much
simpler to understand and implement.
( 2
min )
We propose a novel exemplar selection approach based on Principal Component
Analysis (PCA) and median sampling, and a neural network training regime in the
setting of class-incremental learning. This approach avoids the pitfalls due to
outliers in the data and is both simple to implement and use across various
incremental machine learning models. It also has independent usage as a
sampling algorithm. We achieve better performance compared to state-of-the-art
methods.
( 2
min )
The goal of this series is to chronicle opinions and issues in the field of
machine learning as they stand today and as they change over time. The plan is
to host this survey periodically until the AI singularity
paperclip-frenzy-driven doomsday, keeping an updated list of topical questions
and interviewing new community members for each edition. In this issue, we
probed people's opinions on interpretable AI, the value of benchmarking in
modern NLP, the state of progress towards understanding deep learning, and the
future of academia.
( 2
min )
In this survey, we examine algorithms for conducting credit assignment in
artificial neural networks that are inspired or motivated by neurobiology,
unifying these various processes under one possible taxonomy. Our proposed
taxonomy is constructed based on how a learning algorithm answers a central
question underpinning the mechanisms of synaptic plasticity in complex adaptive
neuronal systems: where do the signals that drive the learning in individual
elements of a network come from and how are they produced? In this unified
treatment, we organize the ever-growing set of brain-inspired learning
processes into six general families and consider these in the context of
backpropagation of errors and its known criticisms. The results of this review
are meant to encourage future developments in neuro-mimetic systems and their
constituent learning processes, wherein lies the opportunity to build a strong
bridge between machine learning, computational neuroscience, and cognitive
science.
( 2
min )
In this paper we consider the adversarial contextual bandit problem in metric
spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem
but when there are many contexts near the decision boundary of the comparator
policy it suffers from a high regret. In this paper we eradicate this problem,
designing an algorithm in which we can hold out any set of contexts when
computing our regret term. Our algorithm builds on that of "Nearest neighbour
with bandit feedback" and hence inherits its extreme computational efficiency.
( 2
min )
Theoretical guarantees in reinforcement learning (RL) are known to suffer
multiplicative blow-up factors with respect to the misspecification error of
function approximation. Yet, the nature of such \emph{approximation factors} --
especially their optimal form in a given learning problem -- is poorly
understood. In this paper we study this question in linear off-policy value
function estimation, where many open questions remain. We study the
approximation factor in a broad spectrum of settings, such as with the weighted
$L_2$-norm (where the weighting is the offline state distribution), the
$L_\infty$ norm, the presence vs. absence of state aliasing, and full vs.
partial coverage of the state space. We establish the optimal asymptotic
approximation factors (up to constants) for all of these settings. In
particular, our bounds identify two instance-dependent factors for the
$L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to
dictate the hardness of off-policy evaluation under misspecification.
( 2
min )
Inverse reinforcement learning (IRL) is computationally challenging, with
common approaches requiring the solution of multiple reinforcement learning
(RL) sub-problems. This work motivates the use of potential-based reward
shaping to reduce the computational burden of each RL sub-problem. This work
serves as a proof-of-concept and we hope will inspire future developments
towards computationally efficient IRL.
( 2
min )
In this paper we consider the adversarial contextual bandit problem in metric
spaces. The paper "Nearest neighbour with bandit feedback" tackled this problem
but when there are many contexts near the decision boundary of the comparator
policy it suffers from a high regret. In this paper we eradicate this problem,
designing an algorithm in which we can hold out any set of contexts when
computing our regret term. Our algorithm builds on that of "Nearest neighbour
with bandit feedback" and hence inherits its extreme computational efficiency.
( 2
min )
There have been claims that artificial intelligence is bringing about increased productivity, accuracy, and a smarter workplace. In all of this excitement, it is difficult to differentiate between fact and fantasy. When it comes to the management of workforces, what is the truth there? Within the context of real-world applications, how much hype is there?… Read More »How can data science and AI help HR in workforce development, evaluation, and retention?
The post How can data science and AI help HR in workforce development, evaluation, and retention? appeared first on Data Science Central.
( 29
min )
Artificial intelligence (AI) is one of the most transformational technologies of our generation and provides opportunities to be a force for good and drive economic growth. The growth of large language models (LLMs), with hundreds of billions of parameters, has unlocked new generative AI use cases to improve customer experiences, boost employee productivity, and so […]
( 4
min )
This is a guest post co-written with Babu Srinivasan from MongoDB. As industries evolve in today’s fast-paced business landscape, the inability to have real-time forecasts poses significant challenges for industries heavily reliant on accurate and timely insights. The absence of real-time forecasts in various industries presents pressing business challenges that can significantly impact decision-making and […]
( 8
min )
In this episode of “AI Frontiers,” AI4Science Director Chris Bishop talks about the state of deep learning; his new textbook, “Deep Learning: Foundations and Concepts,” and the impact the field is having on the natural sciences.
The post AI Frontiers: A deep dive into deep learning with Ashley Llorens and Chris Bishop appeared first on Microsoft Research.
( 24
min )
Bilevel optimization has received more and more attention recently due to its
wide applications in machine learning. In this paper, we consider bilevel
optimization in decentralized networks. In particular, we propose a novel
single-loop algorithm for solving decentralized bilevel optimization with
strongly convex lower level problem. Our algorithm is fully single-loop and
does not require heavy matrix-vector multiplications when approximating the
hypergradient. Moreover, unlike existing methods for decentralized bilevel
optimization and federated bilevel optimization, our algorithm does not require
any gradient heterogeneity assumption. Our analysis shows that the proposed
algorithm achieves a sublinear convergence rate. Experimental results on
hyperparameter optimization problem with both synthetic and MNIST data sets
demonstrate the efficiency of the proposed algorithm.
( 2
min )
In part 1 of the series “A Different AI Scenario: AI and Justice in a Brave New World,” I outlined some requirements for the role that AI would play in enforcing our laws and regulations in a more just and fair manner and what our human legislators must do to ensure that outcome. In part… Read More »AI and Justice in a Brave New World: Part 3 – AI Governance
The post AI and Justice in a Brave New World: Part 3 – AI Governance appeared first on Data Science Central.
( 23
min )
In recent years, Transformer-based auto-attention mechanisms have been
successfully applied to the analysis of a variety of context-reliant data
types, from texts to images and beyond, including data from non-Euclidean
geometries. In this paper, we present such a mechanism, designed to classify
sequences of Symmetric Positive Definite matrices while preserving their
Riemannian geometry throughout the analysis. We apply our method to automatic
sleep staging on timeseries of EEG-derived covariance matrices from a standard
dataset, obtaining high levels of stage-wise performance.
( 2
min )
This paper introduces a physics-informed machine learning approach for
pathloss prediction. This is achieved by including in the training phase
simultaneously (i) physical dependencies between spatial loss field and (ii)
measured pathloss values in the field. It is shown that the solution to a
proposed learning problem improves generalization and prediction quality with a
small number of neural network layers and parameters. The latter leads to fast
inference times which are favorable for downstream tasks such as localization.
Moreover, the physics-informed formulation allows training and prediction with
a small amount of training data which makes it appealing for a wide range of
practical pathloss prediction scenarios.
( 2
min )
Real-time monitoring of human behaviours, especially in e-Health
applications, has been an active area of research in the past decades. On top
of IoT-based sensing environments, anomaly detection algorithms have been
proposed for the early detection of abnormalities. Gradual change procedures,
commonly referred to as drift anomalies, have received much less attention in
the literature because they represent a much more challenging scenario than
sudden temporary changes (point anomalies). In this paper, we propose, for the
first time, a fully unsupervised real-time drift detection algorithm named
DynAmo, which can identify drift periods as they are happening. DynAmo
comprises a dynamic clustering component to capture the overall trends of
monitored behaviours and a trajectory generation component, which extracts
features from the densest cluster centroids. Finally, we apply an ensemble of
divergence tests on sliding reference and detection windows to detect drift
periods in the behavioural sequence.
( 2
min )
We propose a new method called the Metropolis-adjusted Mirror Langevin
algorithm for approximate sampling from distributions whose support is a
compact and convex set. This algorithm adds an accept-reject filter to the
Markov chain induced by a single step of the mirror Langevin algorithm (Zhang
et al., 2020), which is a basic discretisation of the mirror Langevin dynamics.
Due to the inclusion of this filter, our method is unbiased relative to the
target, while known discretisations of the mirror Langevin dynamics including
the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for
the mixing time of the proposed algorithm when the potential is relatively
smooth, convex, and Lipschitz with respect to a self-concordant mirror
function. As a consequence of the reversibility of the Markov chain induced by
the algorithm, we obtain an exponentially better dependence on the error
tolerance for approximate sampling. We also present numerical experiments that
corroborate our theoretical findings.
( 2
min )
(1) The enhanced capability of Graph Neural Networks (GNNs) in unsupervised
community detection of clustered nodes is attributed to their capacity to
encode both the connectivity and feature information spaces of graphs. The
identification of latent communities holds practical significance in various
domains, from social networks to genomics. Current real-world performance
benchmarks are perplexing due to the multitude of decisions influencing GNN
evaluations for this task. (2) Three metrics are compared to assess the
consistency of algorithm rankings in the presence of randomness. The
consistency and quality of performance between the results under a
hyperparameter optimisation with the default hyperparameters is evaluated. (3)
The results compare hyperparameter optimisation with default hyperparameters,
revealing a significant performance loss when neglecting hyperparameter
investigation. A comparison of metrics indicates that ties in ranks can
substantially alter the quantification of randomness. (4) Ensuring adherence to
the same evaluation criteria may result in notable differences in the reported
performance of methods for this task. The $W$ Randomness coefficient, based on
the Wasserstein distance, is identified as providing the most robust assessment
of randomness.
( 3
min )
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems,
where a central operator assigns vehicles to customer requests or rejects these
with the aim of maximizing its total profit. Recent approaches use multi-agent
deep reinforcement learning (MADRL) to realize scalable yet performant
algorithms, but train agents based on local rewards, which distorts the reward
signal with respect to the system-wide profit, leading to lower performance. We
therefore propose a novel global-rewards-based MADRL algorithm for vehicle
dispatching in AMoD systems, which resolves so far existing goal conflicts
between the trained agents and the operator by assigning rewards to agents
leveraging a counterfactual baseline. Our algorithm shows statistically
significant improvements across various settings on real-world data compared to
state-of-the-art MADRL algorithms with local rewards. We further provide a
structural analysis which shows that the utilization of global rewards can
improve implicit vehicle balancing and demand forecasting abilities. Our code
is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
( 2
min )
We propose a framework that leverages foundation models as teachers, guiding
a reinforcement learning agent to acquire semantically meaningful behavior
without human feedback. In our framework, the agent receives task instructions
grounded in a training environment from large language models. Then, a
vision-language model guides the agent in learning the multi-task
language-conditioned policy by providing reward feedback. We demonstrate that
our method can learn semantically meaningful skills in a challenging open-ended
MineDojo environment while prior unsupervised skill discovery methods struggle.
Additionally, we discuss observed challenges of using off-the-shelf foundation
models as teachers and our efforts to address them.
( 2
min )
We present several methods for predicting the dynamics of Hamiltonian systems
from discrete observations of their vector field. Each method is either
informed or uninformed of the Hamiltonian property. We empirically and
comparatively evaluate the methods and observe that information that the system
is Hamiltonian can be effectively informed, and that different methods strike
different trade-offs between efficiency and effectiveness for different
dynamical systems.
( 2
min )
In real-world scenarios classification models are often required to perform
robustly when predicting samples belonging to classes that have not appeared
during its training stage. Open Set Recognition addresses this issue by
devising models capable of detecting unknown classes from samples arriving
during the testing phase, while maintaining a good level of performance in the
classification of samples belonging to known classes. This review
comprehensively overviews the recent literature related to Open Set
Recognition, identifying common practices, limitations, and connections of this
field with other machine learning research areas, such as continual learning,
out-of-distribution detection, novelty detection, and uncertainty estimation.
Our work also uncovers open problems and suggests several research directions
that may motivate and articulate future efforts towards more safe Artificial
Intelligence methods.
( 2
min )
Humanoid robots will be able to assist humans in their daily life, in
particular due to their versatile action capabilities. However, while these
robots need a certain degree of autonomy to learn and explore, they also should
respect various constraints, for access control and beyond. We explore the
novel field of incorporating privacy, security, and access control constraints
with robot task planning approaches. We report preliminary results on the
classical symbolic approach, deep-learned neural networks, and modern ideas
using large language models as knowledge base. From analyzing their trade-offs,
we conclude that a hybrid approach is necessary, and thereby present a new use
case for the emerging field of neuro-symbolic artificial intelligence.
( 2
min )
In continual learning, networks confront a trade-off between stability and
plasticity when trained on a sequence of tasks. To bolster plasticity without
sacrificing stability, we propose a novel training algorithm called LRFR. This
approach optimizes network parameters in the null space of the past tasks'
feature representation matrix to guarantee the stability. Concurrently, we
judiciously select only a subset of neurons in each layer of the network while
training individual tasks to learn the past tasks' feature representation
matrix in low-rank. This increases the null space dimension when designing
network parameters for subsequent tasks, thereby enhancing the plasticity.
Using CIFAR-100 and TinyImageNet as benchmark datasets for continual learning,
the proposed approach consistently outperforms state-of-the-art methods.
( 2
min )
We propose HAROOD as a short-range FMCW radar-based human activity classifier
and out-of-distribution (OOD) detector. It aims to classify human sitting,
standing, and walking activities and to detect any other moving or stationary
object as OOD. We introduce a two-stage network. The first stage is trained
with a novel loss function that includes intermediate reconstruction loss,
intermediate contrastive loss, and triplet loss. The second stage uses the
first stage's output as its input and is trained with cross-entropy loss. It
creates a simple classifier that performs the activity classification. On our
dataset collected by 60 GHz short-range FMCW radar, we achieve an average
classification accuracy of 96.51%. Also, we achieve an average AUROC of 95.04%
as an OOD detector. Additionally, our extensive evaluations demonstrate the
superiority of HAROOD over the state-of-the-art OOD detection methods in terms
of standard OOD detection metrics.
( 2
min )
We address the Continual Learning (CL) problem, where a model has to learn a
sequence of tasks from non-stationary distributions while preserving prior
knowledge as it encounters new experiences. With the advancement of foundation
models, CL research has shifted focus from the initial learning-from-scratch
paradigm to the use of generic features from large-scale pre-training. However,
existing approaches to CL with pre-trained models only focus on separating the
class-specific features from the final representation layer and neglect the
power of intermediate representations that capture low- and mid-level features
naturally more invariant to domain shifts. In this work, we propose LayUP, a
new class-prototype-based approach to continual learning that leverages
second-order feature statistics from multiple intermediate layers of a
pre-trained network. Our method is conceptually simple, does not require any
replay buffer, and works out of the box with any foundation model. LayUP
improves over the state-of-the-art on four of the seven class-incremental
learning settings at a considerably reduced memory and computational footprint
compared with the next best baseline. Our results demonstrate that fully
exhausting the representational capacities of pre-trained models in CL goes far
beyond their final embeddings.
( 2
min )
Deep Reinforcement Learning (DRL) has achieved remarkable advances in
sequential decision tasks. However, recent works have revealed that DRL agents
are susceptible to slight perturbations in observations. This vulnerability
raises concerns regarding the effectiveness and robustness of deploying such
agents in real-world applications. In this work, we propose a novel robust
reinforcement learning method called SortRL, which improves the robustness of
DRL policies against observation perturbations from the perspective of the
network architecture. We employ a novel architecture for the policy network
that incorporates global $l_\infty$ Lipschitz continuity and provide a
convenient method to enhance policy robustness based on the output margin.
Besides, a training framework is designed for SortRL, which solves given tasks
while maintaining robustness against $l_\infty$ bounded perturbations on the
observations. Several experiments are conducted to evaluate the effectiveness
of our method, including classic control tasks and video games. The results
demonstrate that SortRL achieves state-of-the-art robustness performance
against different perturbation strength.
( 2
min )
Many neural network architectures have been shown to be Turing Complete, and
can thus implement arbitrary algorithms. However, Transformers are unique in
that they can implement gradient-based learning algorithms \emph{under simple
parameter configurations}. A line of recent work shows that linear Transformers
naturally learn to implement gradient descent (GD) when trained on a linear
regression in-context learning task. But the linearity assumption (either in
the Transformer architecture or in the learning task) is far from realistic
settings where non-linear activations crucially enable Transformers to learn
complicated non-linear functions. In this paper, we provide theoretical and
empirical evidence that non-linear Transformers can, and \emph{in fact do},
learn to implement learning algorithms to learn non-linear functions in
context. Our results apply to a broad class of combinations of non-linear
architectures, and non-linear in-context learning tasks. Interestingly, we show
that the optimal choice of non-linear activation depends in a natural way on
the non-linearity of the learning task.
( 2
min )
Melanoma is a type of cancer that begins in the cells controlling the pigment
of the skin, and it is often referred to as the most dangerous skin cancer.
Diagnosing melanoma can be time-consuming, and a recent increase in melanoma
incidents indicates a growing demand for a more efficient diagnostic process.
This paper presents a pipeline for melanoma diagnostics, leveraging two
convolutional neural networks, a diagnosis, and a prognosis model. The
diagnostic model is responsible for localizing malignant patches across whole
slide images and delivering a patient-level diagnosis as malignant or benign.
Further, the prognosis model utilizes the diagnostic model's output to provide
a patient-level prognosis as good or bad. The full pipeline has an F1 score of
0.79 when tested on data from the same distribution as it was trained on.
( 2
min )
Polyp segmentation, a contentious issue in medical imaging, has seen numerous
proposed methods aimed at improving the quality of segmented masks. Currently,
state-of-the-art techniques yield impressive results. However, the sheer size
of these models poses challenges for practical industry applications. To
address this, we present a Knowledge Distillation framework, incorporating
attention supervision and the symmetrical guiding method. This framework is
designed to facilitate knowledge transfer from a teacher model to a more
compact student model with fewer parameters. Our experimental evaluation of the
framework assesses its effectiveness in enabling the student model to acquire
knowledge from the teacher efficiently. Additionally, our method serves to
prevent the student model from incorporating redundant features that could lead
to inaccurate predictions. Consequently, our method, boasting approximately 5
million parameters, achieves competitive results comparable to the
state-of-the-art approaches. The implementation can be found at:
https://github.com/huyquoctrinh/KDAS3
( 2
min )
In this work, we formally prove that, under certain conditions, if a neural
network is invariant to a finite group then its weights recover the Fourier
transform on that group. This provides a mathematical explanation for the
emergence of Fourier features -- a ubiquitous phenomenon in both biological and
artificial learning systems. The results hold even for non-commutative groups,
in which case the Fourier transform encodes all the irreducible unitary group
representations. Our findings have consequences for the problem of symmetry
discovery. Specifically, we demonstrate that the algebraic structure of an
unknown group can be recovered from the weights of a network that is at least
approximately invariant within certain bounds. Overall, this work contributes
to a foundation for an algebraic learning theory of invariant neural network
representations.
( 2
min )
This article presents a new methodology for extracting intervals when a home
is vacant from low-frequency electricity consumption data. The approach
combines multiple algorithms, including change point detection, classification,
period detection, and periodic spikes retrieval. It shows encouraging results
on both simulated and real consumption curves. This approach offers practical
insights for optimizing energy use and holds potential benefits for residential
consumers and utility companies in terms of energy cost reduction and
sustainability. Further research is needed to enhance its applicability in
diverse settings and with larger datasets.
( 2
min )
In various scientific and engineering applications, there is typically an
approximate model of the underlying complex system, even though it contains
both aleatoric and epistemic uncertainties. In this paper, we present a
principled method to incorporate these approximate models as physics priors in
modeling, to prevent overfitting and enhancing the generalization capabilities
of the trained models. Utilizing the structural risk minimization (SRM)
inductive principle pioneered by Vapnik, this approach structures the physics
priors into generalized regularizers. The experimental results demonstrate that
our method achieves up to two orders of magnitude of improvement in testing
accuracy.
( 2
min )
We present a novel deep learning method for estimating time-dependent
parameters in Markov processes through discrete sampling. Departing from
conventional machine learning, our approach reframes parameter approximation as
an optimization problem using the maximum likelihood approach. Experimental
validation focuses on parameter estimation in multivariate regression and
stochastic differential equations (SDEs). Theoretical results show that the
real solution is close to SDE with parameters approximated using our neural
network-derived under specific conditions. Our work contributes to SDE-based
model parameter estimation, offering a versatile tool for diverse fields.
( 2
min )
We introduced a new framework to detect perceptual bugs using a Long
Short-Term Memory (LSTM) network, which detects bugs in video games as
anomalies. The detected buggy frames are then clustered to determine the
category of the occurred bug. The framework was evaluated on two First Person
Shooter (FPS) games. Results show the effectiveness of the framework.
( 2
min )
Cardiovascular diseases, particularly heart failure, are a leading cause of
death globally. The early detection of heart failure through routine
echocardiogram screenings is often impeded by the high cost and labor-intensive
nature of these procedures, a barrier that can mean the difference between life
and death. This paper presents ConFormer, a novel deep learning model designed
to automate the estimation of Ejection Fraction (EF) and Left Ventricular Wall
Thickness from echocardiograms. The implementation of ConFormer has the
potential to enhance preventative cardiology by enabling cost-effective,
accessible, and comprehensive heart health monitoring, thereby saving countless
lives. The source code is available at https://github.com/Aether111/ConFormer.
( 2
min )
Hypernetworks are meta neural networks that generate weights for a main
neural network in an end-to-end differentiable manner. Despite extensive
applications ranging from multi-task learning to Bayesian deep learning, the
problem of optimizing hypernetworks has not been studied to date. We observe
that classical weight initialization methods like Glorot & Bengio (2010) and He
et al. (2015), when applied directly on a hypernet, fail to produce weights for
the mainnet in the correct scale. We develop principled techniques for weight
initialization in hypernets, and show that they lead to more stable mainnet
weights, lower training loss, and faster convergence.
( 2
min )
In this paper, we propose a novel personalized decision support system that
combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning
(XRL) to provide effective and interpretable interventions. Our method
leverages DRL to provide expert action recommendations while incorporating ToM
modeling to understand users' mental states and predict their future actions,
enabling appropriate timing for intervention. To explain interventions, we use
counterfactual explanations based on RL's feature importance and users' ToM
model structure. Our proposed system generates accurate and personalized
interventions that are easily interpretable by end-users. We demonstrate the
effectiveness of our approach through a series of crowd-sourcing experiments in
a simulated team decision-making task, where our system outperforms control
baselines in terms of task performance. Our proposed approach is agnostic to
task environment and RL model structure, therefore has the potential to be
generalized to a wide range of applications.
( 2
min )
In many applications, such as scientific literature management, researcher
search, social network analysis and etc, Name Disambiguation (aiming at
disambiguating WhoIsWho) has been a challenging problem. In addition, the
growth of scientific literature makes the problem more difficult and urgent.
Although name disambiguation has been extensively studied in academia and
industry, the problem has not been solved well due to the clutter of data and
the complexity of the same name scenario. In this work, we aim to explore
models that can perform the task of name disambiguation using the network
structure that is intrinsic to the problem and present an analysis of the
models.
( 2
min )
The high dimensionality and complexity of neuroimaging data necessitate large
datasets to develop robust and high-performing deep learning models. However,
the neuroimaging field is notably hampered by the scarcity of such datasets. In
this work, we proposed a data augmentation and validation framework that
utilizes dynamic forecasting with Long Short-Term Memory (LSTM) networks to
enrich datasets. We extended multivariate time series data by predicting the
time courses of independent component networks (ICNs) in both one-step and
recursive configurations. The effectiveness of these augmented datasets was
then compared with the original data using various deep learning models
designed for chronological age prediction tasks. The results suggest that our
approach improves model performance, providing a robust solution to overcome
the challenges presented by the limited size of neuroimaging datasets.
( 2
min )
Motivated by policy gradient methods in the context of reinforcement
learning, we derive the first large deviation rate function for the iterates
generated by stochastic gradient descent for possibly non-convex objectives
satisfying a Polyak-Lojasiewicz condition. Leveraging the contraction principle
from large deviations theory, we illustrate the potential of this result by
showing how convergence properties of policy gradient with a softmax
parametrization and an entropy regularized objective can be naturally extended
to a wide spectrum of other policy parametrizations.
( 2
min )
We study Off-Policy Evaluation (OPE) in contextual bandit settings with large
action spaces. The benchmark estimators suffer from severe bias and variance
tradeoffs. Parametric approaches suffer from bias due to difficulty specifying
the correct model, whereas ones with importance weight suffer from variance. To
overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was
proposed to mitigate the estimator's variance via embeddings of an action.
Nevertheless, MIPS is unbiased under the no direct effect, which assumes that
the action embedding completely mediates the effect of an action on a reward.
To overcome the dependency on these unrealistic assumptions, we propose a
Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the
proposed estimator is unbiased under weaker assumptions than MIPS while
reducing the variance against MIPS. The empirical experiment verifies the
supremacy of MDR against existing estimators with large action spaces.
( 2
min )
This paper introduces a physics-informed machine learning approach for
pathloss prediction. This is achieved by including in the training phase
simultaneously (i) physical dependencies between spatial loss field and (ii)
measured pathloss values in the field. It is shown that the solution to a
proposed learning problem improves generalization and prediction quality with a
small number of neural network layers and parameters. The latter leads to fast
inference times which are favorable for downstream tasks such as localization.
Moreover, the physics-informed formulation allows training and prediction with
a small amount of training data which makes it appealing for a wide range of
practical pathloss prediction scenarios.
( 2
min )
A default assumption in reinforcement learning (RL) and optimal control is
that observations arrive at discrete time points on a fixed clock cycle. Yet,
many applications involve continuous-time systems where the time
discretization, in principle, can be managed. The impact of time discretization
on RL methods has not been fully characterized in existing theory, but a more
detailed analysis of its effect could reveal opportunities for improving
data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation
for LQR systems and uncover a fundamental trade-off between approximation and
statistical error in value estimation. Importantly, these two errors behave
differently to time discretization, leading to an optimal choice of temporal
resolution for a given data budget. These findings show that managing the
temporal resolution can provably improve policy evaluation efficiency in LQR
systems with finite data. Empirically, we demonstrate the trade-off in
numerical simulations of LQR instances and standard RL benchmarks for
non-linear continuous control.
( 2
min )
Gaussian process regression is a classical kernel method for function
estimation and data interpolation. In large data applications, computational
costs can be reduced using low-rank or sparse approximations of the kernel.
This paper investigates the effect of such kernel approximations on the
interpolation error. We introduce a unified framework to analyze Gaussian
process regression under important classes of computational misspecification:
Karhunen-Lo\`eve expansions that result in low-rank kernel approximations,
multiscale wavelet expansions that induce sparsity in the covariance matrix,
and finite element representations that induce sparsity in the precision
matrix. Our theory also accounts for epistemic misspecification in the choice
of kernel parameters.
( 2
min )
This paper considers the problem of evaluating an autonomous system's
competency in performing a task, particularly when working in dynamic and
uncertain environments. The inherent opacity of machine learning models, from
the perspective of the user, often described as a `black box', poses a
challenge. To overcome this, we propose using a measure called the Surprise
index, which leverages available measurement data to quantify whether the
dynamic system performs as expected. We show that the surprise index can be
computed in closed form for dynamic systems when observed evidence in a
probabilistic model if the joint distribution for that evidence follows a
multivariate Gaussian marginal distribution. We then apply it to a nonlinear
spacecraft maneuver problem, where actions are chosen by a reinforcement
learning agent and show it can indicate how well the trajectory follows the
required orbit.
( 2
min )
Predictive Process Monitoring (PPM) aims at leveraging historic process
execution data to predict how ongoing executions will continue up to their
completion. In recent years, PPM techniques for the prediction of the next
activities have matured significantly, mainly thanks to the use of Neural
Networks (NNs) as a predictor. While their performance is difficult to beat in
the general case, there are specific situations where background process
knowledge can be helpful. Such knowledge can be leveraged for improving the
quality of predictions for exceptional process executions or when the process
changes due to a concept drift. In this paper, we present a Symbolic[Neuro]
system that leverages background knowledge expressed in terms of a procedural
process model to offset the under-sampling in the training data. More
specifically, we make predictions using NNs with attention mechanism, an
emerging technology in the NN field. The system has been tested on several
real-life logs showing an improvement in the performance of the prediction
task.
( 2
min )
A large amount of effort has recently been put into understanding the barren
plateau phenomenon. In this perspective article, we face the increasingly loud
elephant in the room and ask a question that has been hinted at by many but not
explicitly addressed: Can the structure that allows one to avoid barren
plateaus also be leveraged to efficiently simulate the loss classically? We
present strong evidence that commonly used models with provable absence of
barren plateaus are also classically simulable, provided that one can collect
some classical data from quantum devices during an initial data acquisition
phase. This follows from the observation that barren plateaus result from a
curse of dimensionality, and that current approaches for solving them end up
encoding the problem into some small, classically simulable, subspaces. This
sheds serious doubt on the non-classicality of the information processing
capabilities of parametrized quantum circuits for barren plateau-free
landscapes and on the possibility of superpolynomial advantages from running
them on quantum hardware. We end by discussing caveats in our arguments, the
role of smart initializations, and by highlighting new opportunities that our
perspective raises.
( 3
min )
We propose a new method called the Metropolis-adjusted Mirror Langevin
algorithm for approximate sampling from distributions whose support is a
compact and convex set. This algorithm adds an accept-reject filter to the
Markov chain induced by a single step of the mirror Langevin algorithm (Zhang
et al., 2020), which is a basic discretisation of the mirror Langevin dynamics.
Due to the inclusion of this filter, our method is unbiased relative to the
target, while known discretisations of the mirror Langevin dynamics including
the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for
the mixing time of the proposed algorithm when the potential is relatively
smooth, convex, and Lipschitz with respect to a self-concordant mirror
function. As a consequence of the reversibility of the Markov chain induced by
the algorithm, we obtain an exponentially better dependence on the error
tolerance for approximate sampling. We also present numerical experiments that
corroborate our theoretical findings.
( 2
min )
We present a novel deep learning method for estimating time-dependent
parameters in Markov processes through discrete sampling. Departing from
conventional machine learning, our approach reframes parameter approximation as
an optimization problem using the maximum likelihood approach. Experimental
validation focuses on parameter estimation in multivariate regression and
stochastic differential equations (SDEs). Theoretical results show that the
real solution is close to SDE with parameters approximated using our neural
network-derived under specific conditions. Our work contributes to SDE-based
model parameter estimation, offering a versatile tool for diverse fields.
( 2
min )
We are excited to announce the launch of Amazon DocumentDB (with MongoDB compatibility) integration with Amazon SageMaker Canvas, allowing Amazon DocumentDB customers to build and use generative AI and machine learning (ML) solutions without writing code. Amazon DocumentDB is a fully managed native JSON document database that makes it straightforward and cost-effective to operate critical […]
( 9
min )
“Minimum viewing time” benchmark gauges image recognition complexity for AI systems by measuring the time needed for accurate human identification.
( 11
min )
Using generative AI, MIT chemists created a model that can predict the structures formed when a chemical reaction reaches its point of no return.
( 9
min )
Generative Artificial Intelligence (AI) is one of the most exciting
developments in Computer Science of the last decade. At the same time,
Reinforcement Learning (RL) has emerged as a very successful paradigm for a
variety of machine learning tasks. In this survey, we discuss the state of the
art, opportunities and open research questions in applying RL to generative AI.
In particular, we will discuss three types of applications, namely, RL as an
alternative way for generation without specified objectives; as a way for
generating outputs while concurrently maximizing an objective function; and,
finally, as a way of embedding desired characteristics, which cannot be easily
captured by means of an objective function, into the generative process. We
conclude the survey with an in-depth discussion of the opportunities and
challenges in this fascinating emerging area.
( 2
min )
In distributed training, communication often emerges as a bottleneck. In
response, we introduce Kimad, a solution that offers adaptive gradient
compression. By consistently monitoring bandwidth, Kimad refines compression
ratios to match specific neural network layer requirements. Our exhaustive
tests and proofs confirm Kimad's outstanding performance, establishing it as a
benchmark in adaptive compression for distributed deep learning.
( 2
min )
Quantum neural networks (QNNs) and quantum kernels stand as prominent figures
in the realm of quantum machine learning, poised to leverage the nascent
capabilities of near-term quantum computers to surmount classical machine
learning challenges. Nonetheless, the training efficiency challenge poses a
limitation on both QNNs and quantum kernels, curbing their efficacy when
applied to extensive datasets. To confront this concern, we present a unified
approach: coreset selection, aimed at expediting the training of QNNs and
quantum kernels by distilling a judicious subset from the original training
dataset. Furthermore, we analyze the generalization error bounds of QNNs and
quantum kernels when trained on such coresets, unveiling the comparable
performance with those training on the complete original dataset. Through
systematic numerical simulations, we illuminate the potential of coreset
selection in expediting tasks encompassing synthetic data classification,
identification of quantum correlations, and quantum compiling. Our work offers
a useful way to improve diverse quantum machine learning models with a
theoretical guarantee while reducing the training cost.
( 2
min )
We present a new method for functional tissue unit segmentation at the
cellular level, which utilizes the latest deep learning semantic segmentation
approaches together with domain adaptation and semi-supervised learning
techniques. This approach allows for minimizing the domain gap, class
imbalance, and captures settings influence between HPA and HubMAP datasets. The
presented approach achieves comparable with state-of-the-art-result in
functional tissue unit segmentation at the cellular level. The source code is
available at https://github.com/VSydorskyy/hubmap_2022_htt_solution
( 2
min )
We consider decentralized learning for zero-sum games, where players only see
their payoff information and are agnostic to actions and payoffs of the
opponent. Previous works demonstrated convergence to a Nash equilibrium in this
setting using double time-scale algorithms under strong reachability
assumptions. We address the open problem of achieving an approximate Nash
equilibrium efficiently with an uncoupled and single time-scale algorithm under
weaker conditions. Our contribution is a rational and convergent algorithm,
utilizing Tsallis-entropy regularization in a value-iteration-based approach.
The algorithm learns an approximate Nash equilibrium in polynomial time,
requiring only the existence of a policy pair that induces an irreducible and
aperiodic Markov chain, thus considerably weakening past assumptions. Our
analysis leverages negative drift inequalities and introduces novel properties
of Tsallis entropy that are of independent interest.
( 2
min )
This paper extends our previous method for COVID-19 diagnosis, proposing an
enhanced solution for detecting COVID-19 from computed tomography (CT) images.
To decrease model misclassifications, two key steps of image processing were
employed. Firstly, the uppermost and lowermost slices were removed, preserving
sixty percent of each patient's slices. Secondly, all slices underwent manual
cropping to emphasize the lung areas. Subsequently, resized CT scans (224 by
224) were input into an Xception transfer learning model. Leveraging Xception's
architecture and pre-trained weights, the modified model achieved binary
classification. Promising results on the COV19-CT database showcased higher
validation accuracy and macro F1 score at both the slice and patient levels
compared to our previous solution and alternatives on the same dataset.
( 2
min )
Cadastres from the 19th century are a complex as well as rich source for
historians and archaeologists, whose use presents them with great challenges.
For archaeological and historical remote sensing, we have trained several Deep
Learning models, CNNs as well as Vision Transformers, to extract large-scale
data from this knowledge representation. We present the principle results of
our work here and we present a the demonstrator of our browser-based tool that
allows researchers and public stakeholders to quickly identify spots that
featured buildings in the 19th century Franciscean Cadastre. The tool not only
supports scholars and fellow researchers in building a better understanding of
the settlement history of the region of Styria, it also helps public
administration and fellow citizens to swiftly identify areas of heightened
sensibility with regard to the cultural heritage of the region.
( 2
min )
Popular guidance for denoising diffusion probabilistic model (DDPM) linearly
combines distinct conditional models together to provide enhanced control over
samples. However, this approach overlooks nonlinear effects that become
significant when guidance scale is large. To address this issue, we propose
characteristic guidance, a novel method that provides non-linear correction for
classifier-free guided DDPMs. Such correction forces the guided DDPMs to
respect the Fokker-Planck equation of their underlying diffusion process, in a
way that is first-principle, training-free, derivative-free, and compatible
with existing sampling methods. Experiments show that characteristic guidance
is robust to various applications, offers enhanced control over sample
generation, suppresses color and exposure issues even for latent space
sampling, and can handle physics problems such as the phase transitions.
( 2
min )
Likelihood-free inference is quickly emerging as a powerful tool to perform
fast/effective parameter estimation. We demonstrate a technique of optimizing
likelihood-free inference to make it even faster by marginalizing symmetries in
a physical problem. In this approach, physical symmetries, for example,
time-translation are learned using joint-embedding via self-supervised learning
with symmetry data augmentations. Subsequently, parameter inference is
performed using a normalizing flow where the embedding network is used to
summarize the data before conditioning the parameters. We present this approach
on two simple physical problems and we show faster convergence in a smaller
number of parameters compared to a normalizing flow that does not use a
pre-trained symmetry-informed representation.
( 2
min )
The utilization of deep learning-based object detection is an effective
approach to assist visually impaired individuals in avoiding obstacles. In this
paper, we implemented seven different YOLO object detection models
\textit{viz}., YOLO-NAS (small, medium, large), YOLOv8, YOLOv7, YOLOv6, and
YOLOv5 and performed comprehensive evaluation with carefully tuned
hyperparameters, to analyze how these models performed on images containing
common daily-life objects presented on roads and sidewalks. After a systematic
investigation, YOLOv8 was found to be the best model, which reached a precision
of $80\%$ and a recall of $68.2\%$ on a well-known Obstacle Dataset which
includes images from VOC dataset, COCO dataset, and TT100K dataset along with
images collected by the researchers in the field. Despite being the latest
model and demonstrating better performance in many other applications, YOLO-NAS
was found to be suboptimal for the obstacle detection task.
( 2
min )
Sleep detection and annotation are crucial for researchers to understand
sleep patterns, especially in children. With modern wrist-worn watches
comprising built-in accelerometers, sleep logs can be collected. However, the
annotation of these logs into distinct sleep events: onset and wakeup, proves
to be challenging. These annotations must be automated, precise, and scalable.
We propose to model the accelerometer data using different machine learning
(ML) techniques such as support vectors, boosting, ensemble methods, and more
complex approaches involving LSTMs and Region-based CNNs. Later, we aim to
evaluate these approaches using the Event Detection Average Precision (EDAP)
score (similar to the IOU metric) to eventually compare the predictive power
and model performance.
( 2
min )
Safeguarding privacy in sensitive training data is paramount, particularly in
the context of generative modeling. This is done through either differentially
private stochastic gradient descent, or with a differentially private metric
for training models or generators. In this paper, we introduce a novel
differentially private generative modeling approach based on parameter-free
gradient flows in the space of probability measures. The proposed algorithm is
a new discretized flow which operates through a particle scheme, utilizing
drift derived from the sliced Wasserstein distance and computed in a private
manner. Our experiments show that compared to a generator-based model, our
proposed model can generate higher-fidelity data at a low privacy budget,
offering a viable alternative to generator-based approaches.
( 2
min )
Influenced mixed moving average fields are a versatile modeling class for
spatio-temporal data. However, their predictive distribution is not generally
known. Under this modeling assumption, we define a novel spatio-temporal
embedding and a theory-guided machine learning approach that employs a
generalized Bayesian algorithm to make ensemble forecasts. We employ Lipschitz
predictors and determine fixed-time and any-time PAC Bayesian bounds in the
batch learning setting. Performing causal forecast is a highlight of our
methodology as its potential application to data with spatial and temporal
short and long-range dependence. We then test the performance of our learning
methodology by using linear predictors and data sets simulated from a
spatio-temporal Ornstein-Uhlenbeck process.
( 2
min )
The randomly pivoted partial Cholesky algorithm (RPCholesky) computes a
factorized rank-k approximation of an N x N positive-semidefinite (psd) matrix.
RPCholesky requires only (k + 1) N entry evaluations and O(k^2 N) additional
arithmetic operations, and it can be implemented with just a few lines of code.
The method is particularly useful for approximating a kernel matrix.
This paper offers a thorough new investigation of the empirical and
theoretical behavior of this fundamental algorithm. For matrix approximation
problems that arise in scientific machine learning, experiments show that
RPCholesky matches or beats the performance of alternative algorithms.
Moreover, RPCholesky provably returns low-rank approximations that are nearly
optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly
support its use in scientific computing and machine learning applications.
( 2
min )
With the rise of voice search, how can businesses adapt their SEO strategies to optimize for conversational queries, backed by data-driven insights? Voice search is causing changes to occur in search engine optimization. Users are using more natural language and conversational queries with voice-activated devices. Businesses need to adjust SEO strategies for changing search behavior.… Read More »Voice Search Revolution: Data-Driven SEO Strategies for Future Success
The post Voice Search Revolution: Data-Driven SEO Strategies for Future Success appeared first on Data Science Central.
( 26
min )
The Energy and Climate Hack presented opportunities for students and companies to collaborate and develop innovative solutions.
( 8
min )
Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for machine learning (ML) development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code Open Source), and RStudio. It provides access to the most comprehensive set of tools for each step of ML development, from preparing data to building, training, […]
( 16
min )
This is a customer post jointly authored by ICL and AWS employees. ICL is a multi-national manufacturing and mining corporation based in Israel that manufactures products based on unique minerals and fulfills humanity’s essential needs, primarily in three markets: agriculture, food, and engineered materials. Their mining sites use industrial equipment that has to be monitored […]
( 8
min )
Amazon Comprehend is a natural-language processing (NLP) service that provides pre-trained and custom APIs to derive insights from textual data. Amazon Comprehend customers can train custom named entity recognition (NER) models to extract entities of interest, such as location, person name, and date, that are unique to their business. To train a custom model, you […]
( 8
min )
Text-to-image generation is a rapidly growing field of artificial intelligence with applications in a variety of areas, such as media and entertainment, gaming, ecommerce product visualization, advertising and marketing, architectural design and visualization, artistic creations, and medical imaging. Stable Diffusion is a text-to-image model that empowers you to create high-quality images within seconds. In November […]
( 9
min )
This post outlines the ETL pipeline we developed for feature processing for training and deploying a job recommender model at Talent.com. Our pipeline uses SageMaker Processing jobs for efficient data processing and feature extraction at a large scale. Feature extraction code is implemented in Python enabling the use of popular ML libraries to perform feature extraction at scale, without the need to port the code to use PySpark.
( 10
min )
This GFN Thursday is burning rubber with the latest Forza Horizon games from Microsoft Studios. Check them out on PC Game Pass. Plus, give the gift of cloud gaming with the latest membership bundle, which includes a free, three-month PC Game Pass subscription with the purchase of a six-month GeForce NOW Ultimate membership. It’s all Read article >
( 6
min )
No content preview
( 1
min )
We present Cross-Client Label Propagation(XCLP), a new method for
transductive federated learning. XCLP estimates a data graph jointly from the
data of multiple clients and computes labels for the unlabeled data by
propagating label information across the graph. To avoid clients having to
share their data with anyone, XCLP employs two cryptographically secure
protocols: secure Hamming distance computation and secure summation. We
demonstrate two distinct applications of XCLP within federated learning. In the
first, we use it in a one-shot way to predict labels for unseen test points. In
the second, we use it to repeatedly pseudo-label unlabeled training data in a
federated semi-supervised setting. Experiments on both real federated and
standard benchmark datasets show that in both applications XCLP achieves higher
classification accuracy than alternative approaches.
( 2
min )
In this paper, we study the mistake bound of online kernel learning on a
budget. We propose a new budgeted online kernel learning model, called
Ahpatron, which significantly improves the mistake bound of previous work and
resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We
first present an aggressive variant of Perceptron, named AVP, a model without
budget, which uses an active updating rule. Then we design a new budget
maintenance mechanism, which removes a half of examples,and projects the
removed examples onto a hypothesis space spanned by the remaining examples.
Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses
prove that Ahpatron has tighter mistake bounds, and experimental results show
that Ahpatron outperforms the state-of-the-art algorithms on the same or a
smaller budget.
( 2
min )
We present the first optimal rates for infinite-dimensional vector-valued
ridge regression on a continuous scale of norms that interpolate between $L_2$
and the hypothesis space, which we consider as a vector-valued reproducing
kernel Hilbert space. These rates allow to treat the misspecified case in which
the true regression function is not contained in the hypothesis space. We
combine standard assumptions on the capacity of the hypothesis space with a
novel tensor product construction of vector-valued interpolation spaces in
order to characterize the smoothness of the regression function. Our upper
bound not only attains the same rate as real-valued kernel ridge regression,
but also removes the assumption that the target regression function is bounded.
For the lower bound, we reduce the problem to the scalar setting using a
projection argument. We show that these rates are optimal in most cases and
independent of the dimension of the output space. We illustrate our results for
the special case of vector-valued Sobolev spaces.
( 2
min )
We propose a novel algorithmic framework for distributional reinforcement
learning, based on learning finite-dimensional mean embeddings of return
distributions. We derive several new algorithms for dynamic programming and
temporal-difference learning based on this framework, provide asymptotic
convergence theory, and examine the empirical performance of the algorithms on
a suite of tabular tasks. Further, we show that this approach can be
straightforwardly combined with deep reinforcement learning, and obtain a new
deep RL agent that improves over baseline distributional approaches on the
Arcade Learning Environment.
( 2
min )
We present ELSA, a practical solution for creating deep networks that can
easily be deployed at different levels of sparsity. The core idea is to embed
one or more sparse networks within a single dense network as a proper subset of
the weights. At prediction time, any sparse model can be extracted effortlessly
simply be zeroing out weights according to a predefined mask. ELSA is simple,
powerful and highly flexible. It can use essentially any existing technique for
network sparsification and network training. In particular, it does not
restrict the loss function, architecture or the optimization technique. Our
experiments show that ELSA's advantages of flexible deployment comes with no or
just a negligible reduction in prediction quality compared to the standard way
of using multiple sparse networks that are trained and stored independently.
( 2
min )
This paper presents a novel methodology for improving the performance of
machine learning based space traffic management tasks through the use of a
pre-trained orbit model. Taking inspiration from BERT-like self-supervised
language models in the field of natural language processing, we introduce
ORBERT, and demonstrate the ability of such a model to leverage large
quantities of readily available orbit data to learn meaningful representations
that can be used to aid in downstream tasks. As a proof of concept of this
approach we consider the task of all vs. all conjunction screening, phrased
here as a machine learning time series classification task. We show that
leveraging unlabelled orbit data leads to improved performance, and that the
proposed approach can be particularly beneficial for tasks where the
availability of labelled data is limited.
( 2
min )
We propose a novel algorithmic framework for distributional reinforcement
learning, based on learning finite-dimensional mean embeddings of return
distributions. We derive several new algorithms for dynamic programming and
temporal-difference learning based on this framework, provide asymptotic
convergence theory, and examine the empirical performance of the algorithms on
a suite of tabular tasks. Further, we show that this approach can be
straightforwardly combined with deep reinforcement learning, and obtain a new
deep RL agent that improves over baseline distributional approaches on the
Arcade Learning Environment.
( 2
min )
In this paper, we introduce a novel analysis of neural networks based on
geometric (Clifford) algebra and convex optimization. We show that optimal
weights of deep ReLU neural networks are given by the wedge product of training
samples when trained with standard regularized loss. Furthermore, the training
problem reduces to convex optimization over wedge product features, which
encode the geometric structure of the training dataset. This structure is given
in terms of signed volumes of triangles and parallelotopes generated by data
vectors. The convex problem finds a small subset of samples via $\ell_1$
regularization to discover only relevant wedge product features. Our analysis
provides a novel perspective on the inner workings of deep neural networks and
sheds light on the role of the hidden layers.
( 2
min )
In this paper, we study the mistake bound of online kernel learning on a
budget. We propose a new budgeted online kernel learning model, called
Ahpatron, which significantly improves the mistake bound of previous work and
resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We
first present an aggressive variant of Perceptron, named AVP, a model without
budget, which uses an active updating rule. Then we design a new budget
maintenance mechanism, which removes a half of examples,and projects the
removed examples onto a hypothesis space spanned by the remaining examples.
Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses
prove that Ahpatron has tighter mistake bounds, and experimental results show
that Ahpatron outperforms the state-of-the-art algorithms on the same or a
smaller budget.
( 2
min )
The low-level spatial detail information and high-level semantic abstract
information are both essential to the semantic segmentation task. The features
extracted by the deep network can obtain rich semantic information, while a lot
of spatial information is lost. However, how to recover spatial detail
information effectively and fuse it with high-level semantics has not been well
addressed so far. In this paper, we propose a new architecture based on
Bilateral Segmentation Network (BiseNet) called Multi-scale Covariance Feature
Fusion Network (MCFNet). Specifically, this network introduces a new feature
refinement module and a new feature fusion module. Furthermore, a gating unit
named L-Gate is proposed to filter out invalid information and fuse multi-scale
features. We evaluate our proposed model on Cityscapes, CamVid datasets and
compare it with the state-of-the-art methods. Extensive experiments show that
our method achieves competitive success. On Cityscapes, we achieve 75.5% mIOU
with a speed of 151.3 FPS.
( 2
min )
We present the first optimal rates for infinite-dimensional vector-valued
ridge regression on a continuous scale of norms that interpolate between $L_2$
and the hypothesis space, which we consider as a vector-valued reproducing
kernel Hilbert space. These rates allow to treat the misspecified case in which
the true regression function is not contained in the hypothesis space. We
combine standard assumptions on the capacity of the hypothesis space with a
novel tensor product construction of vector-valued interpolation spaces in
order to characterize the smoothness of the regression function. Our upper
bound not only attains the same rate as real-valued kernel ridge regression,
but also removes the assumption that the target regression function is bounded.
For the lower bound, we reduce the problem to the scalar setting using a
projection argument. We show that these rates are optimal in most cases and
independent of the dimension of the output space. We illustrate our results for
the special case of vector-valued Sobolev spaces.
( 2
min )
In this paper, we provide novel tail bounds on the optimization error of
Stochastic Mirror Descent for convex and Lipschitz objectives. Our analysis
extends the existing tail bounds from the classical light-tailed Sub-Gaussian
noise case to heavier-tailed noise regimes. We study the optimization error of
the last iterate as well as the average of the iterates. We instantiate our
results in two important cases: a class of noise with exponential tails and one
with polynomial tails. A remarkable feature of our results is that they do not
require an upper bound on the diameter of the domain. Finally, we support our
theory with illustrative experiments that compare the behavior of the average
of the iterates with that of the last iterate in heavy-tailed noise regimes.
( 2
min )
The graduate students will aim to commercialize innovations in AI, machine learning, and data science.
( 8
min )
Study shows computational models trained to perform auditory tasks display an internal organization similar to that of the human auditory cortex.
( 9
min )
A new method enables optical devices that more closely match their design specifications, boosting accuracy and efficiency.
( 10
min )
Zipline isn’t just some pie-in-the-sky drone startup. The San Francisco-based company has completed more than 800,000 deliveries in seven countries since its start in 2011. It recently added services for Seattle’s Pagliacci Pizza, vitamin and supplement giant GNC, and large health systems like Intermountain Health, OhioHealth and Michigan Medicine. Zipline developed its drones — which Read article >
( 6
min )
Meeting notes are a crucial part of collaboration, yet they often fall through the cracks. Between leading discussions, listening closely, and typing notes, it’s easy for key information to slip away unrecorded. Even when notes are captured, they can be disorganized or illegible, rendering them useless. In this post, we explore how to use Amazon […]
( 8
min )
In this post, we showcase fine-tuning a Llama 2 model using a Parameter-Efficient Fine-Tuning (PEFT) method and deploy the fine-tuned model on AWS Inferentia2. We use the AWS Neuron software development kit (SDK) to access the AWS Inferentia2 device and benefit from its high performance. We then use a large model inference container powered by […]
( 10
min )
Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Building a robust MLOps pipeline demands cross-functional […]
( 13
min )
Axel Springer is the first publishing house globally to partner with us on a deeper integration of journalism in AI technologies.
( 2
min )
In this work, we present Transformer-based Powered Descent Guidance (T-PDG),
a scalable algorithm for reducing the computational complexity of the direct
optimization formulation of the spacecraft powered descent guidance problem.
T-PDG uses data from prior runs of trajectory optimization algorithms to train
a transformer neural network, which accurately predicts the relationship
between problem parameters and the globally optimal solution for the powered
descent guidance problem. The solution is encoded as the set of tight
constraints corresponding to the constrained minimum-cost trajectory and the
optimal final time of landing. By leveraging the attention mechanism of
transformer neural networks, large sequences of time series data can be
accurately predicted when given only the spacecraft state and landing site
parameters. When applied to the real problem of Mars powered descent guidance,
T-PDG reduces the time for computing the 3 degree of freedom fuel-optimal
trajectory, when compared to lossless convexification, from an order of 1-8
seconds to less than 500 milliseconds. A safe and optimal solution is
guaranteed by including a feasibility check in T-PDG before returning the final
trajectory.
( 2
min )
We introduce a curriculum learning algorithm, Variational Automatic
Curriculum Learning (VACL), for solving challenging goal-conditioned
cooperative multi-agent reinforcement learning problems. We motivate our
paradigm through a variational perspective, where the learning objective can be
decomposed into two terms: task learning on the current task distribution, and
curriculum update to a new task distribution. Local optimization over the
second term suggests that the curriculum should gradually expand the training
tasks from easy to hard. Our VACL algorithm implements this variational
paradigm with two practical components, task expansion and entity progression,
which produces training curricula over both the task configurations as well as
the number of entities in the task. Experiment results show that VACL solves a
collection of sparse-reward problems with a large number of agents.
Particularly, using a single desktop machine, VACL achieves 98% coverage rate
with 100 agents in the simple-spread benchmark and reproduces the ramp-use
behavior originally shown in OpenAI's hide-and-seek project. Our project
website is at https://sites.google.com/view/vacl-neurips-2021.
( 2
min )
Multilinear Principal Component Analysis (MPCA) is a widely utilized method
for the dimension reduction of tensor data. However, the integration of MPCA
into federated learning remains unexplored in existing research. To tackle this
gap, this article proposes a Federated Multilinear Principal Component Analysis
(FMPCA) method, which enables multiple users to collaboratively reduce the
dimension of their tensor data while keeping each user's data local and
confidential. The proposed FMPCA method is guaranteed to have the same
performance as traditional MPCA. An application of the proposed FMPCA in
industrial prognostics is also demonstrated. Simulated data and a real-world
data set are used to validate the performance of the proposed method.
( 2
min )
This paper presents a novel algorithm that leverages Stochastic Gradient
Descent strategies in conjunction with Random Features to augment the
scalability of Conic Particle Gradient Descent (CPGD) specifically tailored for
solving sparse optimisation problems on measures. By formulating the CPGD steps
within a variational framework, we provide rigorous mathematical proofs
demonstrating the following key findings: (i) The total variation norms of the
solution measures along the descent trajectory remain bounded, ensuring
stability and preventing undesirable divergence; (ii) We establish a global
convergence guarantee with a convergence rate of
$\mathcal{O}(\log(K)/\sqrt{K})$ over $K$ iterations, showcasing the efficiency
and effectiveness of our algorithm; (iii) Additionally, we analyze and
establish local control over the first-order condition discrepancy,
contributing to a deeper understanding of the algorithm's behavior and
reliability in practical applications.
( 2
min )
Differentiating noisy, discrete measurements in order to fit an ordinary
differential equation can be unreasonably effective. Assuming square-integrable
noise and minimal flow regularity, we construct and analyze a finite-difference
differentiation filter and a Tikhonov-regularized least squares estimator for
the continuous-time parameter-linear system. Combining these contributions in
series, we obtain a finite-sample bound on mean absolute error of estimation.
As a by-product, we offer a novel analysis of stochastically perturbed
Moore-Penrose pseudoinverses.
( 2
min )
To address the bias of the canonical two-way fixed effects estimator for
difference-in-differences under staggered adoptions, Wooldridge (2021) proposed
the extended two-way fixed effects estimator, which adds many parameters.
However, this reduces efficiency. Restricting some of these parameters to be
equal helps, but ad hoc restrictions may reintroduce bias. We propose a machine
learning estimator with a single tuning parameter, fused extended two-way fixed
effects (FETWFE), that enables automatic data-driven selection of these
restrictions. We prove that under an appropriate sparsity assumption FETWFE
identifies the correct restrictions with probability tending to one. We also
prove the consistency, asymptotic normality, and oracle efficiency of FETWFE
for two classes of heterogeneous marginal treatment effect estimators under
either conditional or marginal parallel trends, and we prove consistency for
two classes of conditional average treatment effects under conditional parallel
trends. We demonstrate FETWFE in simulation studies and an empirical
application.
( 2
min )
Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training data curation make it ideal for exploration around mechanistic interpretability, safety improvements, and fine-tuning experimentation on a variety of tasks.
The post Phi-2: The surprising power of small language models appeared first on Microsoft Research.
( 11
min )
The launch of ChatGPT and rise in popularity of generative AI have captured the imagination of customers who are curious about how they can use this technology to create new products and services on AWS, such as enterprise chatbots, which are more conversational. This post shows you how you can create a web UI, which […]
( 9
min )
Large language models (or LLMs) have become a topic of daily conversations. Their quick adoption is evident by the amount of time required to reach a 100 million users, which has gone from “4.5yrs by facebook” to an all-time low of mere “2 months by ChatGPT.” A generative pre-trained transformer (GPT) uses causal autoregressive updates […]
( 7
min )
Vodafone is transitioning from a telecommunications company (telco) to a technology company (TechCo) by 2025, with objectives of innovating faster, reducing costs, improving security, and simplifying operations. Thousands of engineers are being onboarded to contribute to this transition. By 2025, Vodafone plans to have 50% of its global workforce actively involved in software development, with […]
( 6
min )
Justin Solomon applies modern geometric techniques to solve problems in computer vision, machine learning, statistics, and beyond.
( 10
min )
The creative team at Moonshine Studio — an artist-focused visual effects (VFX) studio specializing in animation and motion design — was tasked to solve a problem.
( 7
min )
The expressivity of Graph Neural Networks (GNNs) can be entirely
characterized by appropriate fragments of the first-order logic. Namely, any
query of the two variable fragment of graded modal logic (GC2) interpreted over
labeled graphs can be expressed using a GNN whose size depends only on the
depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021], this
description holds for a family of activation functions, leaving the possibility
for a hierarchy of logics expressible by GNNs depending on the chosen
activation function. In this article, we show that such hierarchy indeed exists
by proving that GC2 queries cannot be expressed by GNNs with polynomial
activation functions. This implies a separation between polynomial and popular
non-polynomial activations (such as Rectified Linear Units) and answers an open
question formulated by [Grohe, 2021].
( 2
min )
In the era of artificial intelligence, data is gold but costly to annotate.
The paper demonstrates a groundbreaking solution to this dilemma using ChatGPT
for text augmentation in sentiment analysis. We leverage ChatGPT's generative
capabilities to create synthetic training data that significantly improves the
performance of smaller models, making them competitive with, or even
outperforming, their larger counterparts. This innovation enables models to be
both efficient and effective, thereby reducing computational cost, inference
time, and memory usage without compromising on quality. Our work marks a key
advancement in the cost-effective development and deployment of robust
sentiment analysis models.
( 2
min )
The Chinese Space Station Telescope (abbreviated as CSST) is a future
advanced space telescope. Real-time identification of galaxy and nebula/star
cluster (abbreviated as NSC) images is of great value during CSST survey. While
recent research on celestial object recognition has progressed, the rapid and
efficient identification of high-resolution local celestial images remains
challenging. In this study, we conducted galaxy and NSC image classification
research using deep learning methods based on data from the Hubble Space
Telescope. We built a Local Celestial Image Dataset and designed a deep
learning model named HR-CelestialNet for classifying images of the galaxy and
NSC. HR-CelestialNet achieved an accuracy of 89.09% on the testing set,
outperforming models such as AlexNet, VGGNet and ResNet, while demonstrating
faster recognition speeds. Furthermore, we investigated the factors influencing
CSST image quality and evaluated the generalization ability of HR-CelestialNet
on the blurry image dataset, demonstrating its robustness to low image quality.
The proposed method can enable real-time identification of celestial images
during CSST survey mission.
( 2
min )
Assurance Cases (ACs) are an established approach in safety engineering to
argue quality claims in a structured way. In the context of quality assurance
for Machine Learning (ML)-based software components, ACs are also being
discussed and appear promising. Tools for operationalizing ACs do exist, yet
mainly focus on supporting safety engineers on the system level. However,
assuring the quality of an ML component within the system is commonly the
responsibility of data scientists, who are usually less familiar with these
tools. To address this gap, we propose a framework to support the
operationalization of ACs for ML components based on technologies that data
scientists use on a daily basis: Python and Jupyter Notebook. Our aim is to
make the process of creating ML-related evidence in ACs more effective. Results
from the application of the framework, documented through notebooks, can be
integrated into existing AC tools. We illustrate the application of the
framework on an example excerpt concerned with the quality of the test data.
( 3
min )
Training generative models to produce synthetic data is meant to provide a
privacy-friendly approach to data release. However, we get robust guarantees
only when models are trained to satisfy Differential Privacy (DP). Alas, this
is not the standard in industry as many companies use ad-hoc strategies to
empirically evaluate privacy based on the statistical similarity between
synthetic and real data. In this paper, we review the privacy metrics offered
by leading companies in this space and shed light on a few critical flaws in
reasoning about privacy entirely via empirical evaluations. We analyze the
undesirable properties of the most popular metrics and filters and demonstrate
their unreliability and inconsistency through counter-examples. We then present
a reconstruction attack, ReconSyn, which successfully recovers (i.e., leaks all
attributes of) at least 78% of the low-density train records (or outliers) with
only black-box access to a single fitted generative model and the privacy
metrics. Finally, we show that applying DP only to the model or using
low-utility generators does not mitigate ReconSyn as the privacy leakage
predominantly comes from the metrics. Overall, our work serves as a warning to
practitioners not to deviate from established privacy-preserving mechanisms.
( 2
min )
Communication networks able to withstand hostile environments are critically
important for disaster relief operations. In this paper, we consider a
challenging scenario where drones have been compromised in the supply chain,
during their manufacture, and harbour malicious software capable of
wide-ranging and infectious disruption. We investigate multi-agent deep
reinforcement learning as a tool for learning defensive strategies that
maximise communications bandwidth despite continual adversarial interference.
Using a public challenge for learning network resilience strategies, we propose
a state-of-the-art expert technique and study its superiority over deep
reinforcement learning agents. Correspondingly, we identify three specific
methods for improving the performance of our learning-based agents: (1)
ensuring each observation contains the necessary information, (2) using expert
agents to provide a curriculum for learning, and (3) paying close attention to
reward. We apply our methods and present a new mixed strategy enabling expert
and learning-based agents to work together and improve on all prior results.
( 2
min )
Can we learn policies in reinforcement learning without rewards? Can we learn
a policy just by trying to reach a goal state? We answer these questions
positively by proposing a multi-step procedure that first learns a world model
that goes backward in time, secondly generates goal-reaching backward
trajectories, thirdly improves those sequences using shortest path finding
algorithms, and finally trains a neural network policy by imitation learning.
We evaluate our method on a deterministic maze environment where the
observations are $64\times 64$ pixel bird's eye images and can show that it
consistently reaches several goals.
( 2
min )
SCGAN adds a similarity constraint between generated images and conditions as
a regularization term on generative adversarial networks. Similarity constraint
works as a tutor to instruct the generator network to comprehend the difference
of representations based on conditions. We understand how SCGAN works on a
deeper level. This understanding makes us realize that the similarity
constraint functions like the contrastive loss function. We believe that a
model with high understanding and intelligence measures the similarity between
images based on their structure and high level features, just like humans do.
Two major changes we applied to SCGAN in order to make a modified model are
using SSIM to measure similarity between images and applying contrastive loss
principles to the similarity constraint. The modified model performs better
using FID and FactorVAE metrics. The modified model also has better
generalisability compared to other models. Keywords Generative Adversarial
Nets, Unsupervised Learning, Disentangled Representation Learning, Contrastive
Disentanglement, SSIM
( 2
min )
The discovery of neural architectures from simple building blocks is a
long-standing goal of Neural Architecture Search (NAS). Hierarchical search
spaces are a promising step towards this goal but lack a unifying search space
design framework and typically only search over some limited aspect of
architectures. In this work, we introduce a unifying search space design
framework based on context-free grammars that can naturally and compactly
generate expressive hierarchical search spaces that are 100s of orders of
magnitude larger than common spaces from the literature. By enhancing and using
their properties, we effectively enable search over the complete architecture
and can foster regularity. Further, we propose an efficient hierarchical kernel
design for a Bayesian Optimization search strategy to efficiently search over
such huge spaces. We demonstrate the versatility of our search space design
framework and show that our search strategy can be superior to existing NAS
approaches. Code is available at
https://github.com/automl/hierarchical_nas_construction.
( 2
min )
We’re proud to have 100+ accepted papers At NeurIPS 2023, plus 18 workshops. Several submissions were chosen as oral presentations and spotlight posters, reflecting groundbreaking concepts, methods, or applications. Here’s an overview of those submissions.
The post NeurIPS 2023 highlights breadth of Microsoft’s machine learning innovation appeared first on Microsoft Research.
( 16
min )
The series aims to help policymakers create better oversight of AI in society.
( 12
min )
In today’s digital marketing world, things are changing fast, and artificial intelligence (AI) is a big part of that. Companies want to stay ahead, so they’re smartly choosing to get help from outside experts in digital marketing who use AI tools. This helps them make the most of what AI can do. AI is like… Read More »Maximizing marketing potential: The AI-driven revolution in outsourced digital marketing
The post Maximizing marketing potential: The AI-driven revolution in outsourced digital marketing appeared first on Data Science Central.
( 22
min )
Much has been said about the economic impact of AGI, some of it is already been feltBut not much has been proposed about solutionsSpecifically, what approaches should policy makers take? Here, I propose that policy makers should encourage two key trends – together which could alleviate the issues of AI – The Gig economy and… Read More »Universal basic income and the gig economy: A combined policy approach to alleviate the challenges of AI
The post Universal basic income and the gig economy: A combined policy approach to alleviate the challenges of AI appeared first on Data Science Central.
( 21
min )
Great companies thrive on stories. Sid Siddeek, who runs NVIDIA’s venture capital arm, knows this well. Siddeek still remembers one of his first jobs, schlepping presentation materials from one investor meeting to another, helping the startup’s CEO and management team get the story out while working from a trailer that “shook when the door opened,” Read article >
( 7
min )
In part 1 of the series “A Different AI Scenario: AI and Justice in a Brave New World,” we outlined some requirements for the role that AI would play in enforcing our laws and regulations in a more just and fair manner and what our human legislators must do to ensure those more just and… Read More »AI and Justice in a Brave New World Part 2 – Humanizing AI
The post AI and Justice in a Brave New World Part 2 – Humanizing AI appeared first on Data Science Central.
( 22
min )
Finding classifiers robust to adversarial examples is critical for their safe
deployment. Determining the robustness of the best possible classifier under a
given threat model for a given data distribution and comparing it to that
achieved by state-of-the-art training methods is thus an important diagnostic
tool. In this paper, we find achievable information-theoretic lower bounds on
loss in the presence of a test-time attacker for multi-class classifiers on any
discrete dataset. We provide a general framework for finding the optimal 0-1
loss that revolves around the construction of a conflict hypergraph from the
data and adversarial constraints. We further define other variants of the
attacker-classifier game that determine the range of the optimal loss more
efficiently than the full-fledged hypergraph construction. Our evaluation
shows, for the first time, an analysis of the gap to optimal robustness for
classifiers in the multi-class setting on benchmark datasets.
( 2
min )
We explore colour versus shape goal misgeneralization originally demonstrated
by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an
ambiguous choice, the agents seem to prefer generalization based on colour
rather than shape. After training over 1,000 agents in a simplified version of
the environment and evaluating them on over 10 million episodes, we conclude
that the behaviour can be attributed to the agents learning to detect the goal
object through a specific colour channel. This choice is arbitrary.
Additionally, we show how, due to underspecification, the preferences can
change when retraining the agents using exactly the same procedure except for
using a different random seed for the training run. Finally, we demonstrate the
existence of outliers in out-of-distribution behaviour based on training random
seed alone.
( 2
min )
The Classification Tree (CT) is one of the most common models in
interpretable machine learning. Although such models are usually built with
greedy strategies, in recent years, thanks to remarkable advances in
Mixer-Integer Programming (MIP) solvers, several exact formulations of the
learning problem have been developed. In this paper, we argue that some of the
most relevant ones among these training models can be encapsulated within a
general framework, whose instances are shaped by the specification of loss
functions and regularizers. Next, we introduce a novel realization of this
framework: specifically, we consider the logistic loss, handled in the MIP
setting by a linear piece-wise approximation, and couple it with
$\ell_1$-regularization terms. The resulting Optimal Logistic Tree model
numerically proves to be able to induce trees with enhanced interpretability
features and competitive generalization capabilities, compared to the
state-of-the-art MIP-based approaches.
( 2
min )
We report the effects of replacing the scaled dot-product (within softmax)
attention with the negative-log of Euclidean distance. This form of attention
simplifies to inverse distance weighting interpolation. Used in simple one
hidden layer networks and trained with vanilla cross-entropy loss on
classification problems, it tends to produce a key matrix containing prototypes
and a value matrix with corresponding logits. We also show that the resulting
interpretable networks can be augmented with manually-constructed prototypes to
perform low-impact handling of special cases.
( 2
min )
In this paper, we study the method to reconstruct dynamical systems from data
without time labels. Data without time labels appear in many applications, such
as molecular dynamics, single-cell RNA sequencing etc. Reconstruction of
dynamical system from time sequence data has been studied extensively. However,
these methods do not apply if time labels are unknown. Without time labels,
sequence data becomes distribution data. Based on this observation, we propose
to treat the data as samples from a probability distribution and try to
reconstruct the underlying dynamical system by minimizing the distribution
loss, sliced Wasserstein distance more specifically. Extensive experiment
results demonstrate the effectiveness of the proposed method.
( 2
min )
Sentiment analysis of social media data is an emerging field with vast
applications in various domains. In this study, we developed a sentiment
analysis model to analyze social media sentiment, especially tweets, during
global conflicting scenarios. To establish our research experiment, we
identified a recent global dispute incident on Twitter and collected around
31,000 filtered Tweets for several months to analyze human sentiment worldwide.
( 2
min )
A simple graph on $n$ vertices may contain a lot of maximum cliques. But how
many can it potentially contain? We will define prime and composite graphs, and
we will show that if $n \ge 15$, then the grpahs with the maximum number of
maximum cliques have to be composite. Moreover, we will show an edge bound from
which we will prove that if any factor of a composite graph has $\omega(G_i)
\ge 5$, then it cannot have the maximum number of maximum cliques. Using this
we will show that the graph that contains $3^{\lfloor n/3 \rfloor}c$ maximum
cliques has the most number of maximum cliques on $n$ vertices, where
$c\in\{1,\frac{4}{3},2\}$, depending on $n \text{ mod } 3$.
( 2
min )
We define and study a fully-convolutional neural network stochastic model,
NN-Turb, which generates a 1-dimensional field with some turbulent velocity
statistics. In particular, the generated process satisfies the Kolmogorov 2/3
law for second order structure function. It also presents negative skewness
across scales (i.e. Kolmogorov 4/5 law) and exhibits intermittency as
characterized by skewness and flatness. Furthermore, our model is never in
contact with turbulent data and only needs the desired statistical behavior of
the structure functions across scales for training.
( 2
min )
Multi-distribution learning generalizes the classic PAC learning to handle
data coming from multiple distributions. Given a set of $k$ data distributions
and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis
that minimizes the maximum population loss over $k$ distributions, up to
$\epsilon$ additive error. In this paper, we settle the sample complexity of
multi-distribution learning by giving an algorithm of sample complexity
$\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the
lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem
of Awasthi, Haghtalab and Zhao [AHZ23].
( 2
min )
Reliable uncertainty quantification (UQ) in machine learning (ML) regression
tasks is becoming the focus of many studies in materials and chemical science.
It is now well understood that average calibration is insufficient, and most
studies implement additional methods testing the conditional calibration with
respect to uncertainty, i.e. consistency. Consistency is assessed mostly by
so-called reliability diagrams. There exists however another way beyond average
calibration, which is conditional calibration with respect to input features,
i.e. adaptivity. In practice, adaptivity is the main concern of the final users
of a ML-UQ method, seeking for the reliability of predictions and uncertainties
for any point in features space. This article aims to show that consistency and
adaptivity are complementary validation targets, and that a good consistency
does not imply a good adaptivity. Adapted validation methods are proposed and
illustrated on a representative example.
( 2
min )
We present a performant, general-purpose gradient-guided nested sampling
algorithm, ${\tt GGNS}$, combining the state of the art in differentiable
programming, Hamiltonian slice sampling, clustering, mode separation, dynamic
nested sampling, and parallelization. This unique combination allows ${\tt
GGNS}$ to scale well with dimensionality and perform competitively on a variety
of synthetic and real-world problems. We also show the potential of combining
nested sampling with generative flow networks to obtain large amounts of
high-quality samples from the posterior distribution. This combination leads to
faster mode discovery and more accurate estimates of the partition function.
( 2
min )
To tackle long planning horizon problems in reinforcement learning with
general function approximation, we propose the first algorithm, termed as
UCRL-WVTR, that achieves both \emph{horizon-free} and
\emph{instance-dependent}, since it eliminates the polynomial dependency on the
planning horizon. The derived regret bound is deemed \emph{sharp}, as it
matches the minimax lower bound when specialized to linear mixture MDPs up to
logarithmic factors. Furthermore, UCRL-WVTR is \emph{computationally efficient}
with access to a regression oracle. The achievement of such a horizon-free,
instance-dependent, and sharp regret bound hinges upon (i) novel algorithm
designs: weighted value-targeted regression and a high-order moment estimator
in the context of general function approximation; and (ii) fine-grained
analyses: a novel concentration bound of weighted non-linear least squares and
a refined analysis which leads to the tight instance-dependent bound. We also
conduct comprehensive experiments to corroborate our theoretical findings.
( 2
min )
In the era of fast-paced precision medicine, observational studies play a
major role in properly evaluating new treatments in clinical practice. Yet,
unobserved confounding can significantly compromise causal conclusions drawn
from non-randomized data. We propose a novel strategy that leverages randomized
trials to quantify unobserved confounding. First, we design a statistical test
to detect unobserved confounding with strength above a given threshold. Then,
we use the test to estimate an asymptotically valid lower bound on the
unobserved confounding strength. We evaluate the power and validity of our
statistical test on several synthetic and semi-synthetic datasets. Further, we
show how our lower bound can correctly identify the absence and presence of
unobserved confounding in a real-world setting.
( 2
min )
Inventory management is crucial for businesses, but it can be tedious. It can make or break a business, regardless of its age. AI has revolutionized business management and inventory control. AI can now do more than just follow instructions. It can analyze inventory history, predict customer behavior, and anticipate business needs. Want to know what… Read More »Harness the power of an AI-powered forecasting model to revitalize your business
The post Harness the power of an AI-powered forecasting model to revitalize your business appeared first on Data Science Central.
( 26
min )
Between the two of them, ChatGPT4 can generate the lyrics to Christmas carols, and DALL-E3 can illustrate them!
Throw your old carol books away because this is the only guide you'll need.
12 Days of Christmas
"Please generate an illustration where each of the 12 days'
( 3
min )
AI Weirdness: the strange side of machine learning
( 2
min )
MIT researchers develop a customized onboarding process that helps a human learn when a model’s advice is trustworthy.
( 11
min )
We introduce SwiftSage, a novel agent framework inspired by the dual-process
theory of human cognition, designed to excel in action planning for complex
interactive reasoning tasks. SwiftSage integrates the strengths of behavior
cloning and prompting large language models (LLMs) to enhance task completion
performance. The framework comprises two primary modules: the Swift module,
representing fast and intuitive thinking, and the Sage module, emulating
deliberate thought processes. The Swift module is a small encoder-decoder LM
fine-tuned on the oracle agent's action trajectories, while the Sage module
employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a
heuristic method to harmoniously integrate the two modules, resulting in a more
efficient and robust problem-solving process. In 30 tasks from the ScienceWorld
benchmark, SwiftSage significantly outperforms other methods such as SayCan,
ReAct, and Reflexion, demonstrating its effectiveness in solving complex
interactive tasks.
( 2
min )
Maintenance work orders are commonly used to document information about wind
turbine operation and maintenance. This includes details about proactive and
reactive wind turbine downtimes, such as preventative and corrective
maintenance. However, the information contained in maintenance work orders is
often unstructured and difficult to analyze, presenting challenges for
decision-makers wishing to use it for optimizing operation and maintenance. To
address this issue, this work compares three different approaches to calculate
reliability by performance indicators from maintenance work orders. The first
approach involves manual labeling of the maintenance work orders by domain
experts, using the schema defined in an industrial guideline to assign the
label accordingly. The second approach involves the development of a model that
automatically labels the maintenance work orders using text classification
methods. Through this method, we are able to achieve macro average and weighted
average F1-Scores of 0.75 and 0.85 respectively. The third technique uses an
AI-assisted tagging tool to tag and structure the raw maintenance information,
together with a novel rule-based approach for extracting relevant maintenance
work orders for failure rate calculation. In our experiments the AI-assisted
tool leads to a 88% drop in tagging time in comparison to the other two
approaches, while expert labeling and text classification are more accurate in
KPI extraction. Overall, our findings make extracting maintenance information
from maintenance work orders more efficient, enable the assessment of
reliability key performance indicators and therefore support the optimization
of wind turbine operation and maintenance.
( 3
min )
Physics informed neural networks (PINNs) have recently been widely used for
robust and accurate approximation of PDEs. We provide rigorous upper bounds on
the generalization error of PINNs approximating solutions of the forward
problem for PDEs. An abstract formalism is introduced and stability properties
of the underlying PDE are leveraged to derive an estimate for the
generalization error in terms of the training error and number of training
samples. This abstract framework is illustrated with several examples of
nonlinear PDEs. Numerical experiments, validating the proposed theory, are also
presented.
( 2
min )
Recent advances in language models (LMs), have demonstrated significant
efficacy in tasks related to the arts and humanities. While LMs have exhibited
exceptional performance across a wide range of natural language processing
tasks, there are notable challenges associated with their utilization on small
datasets and their ability to replicate more creative human capacities. In this
study, we aim to address these challenges by training a Persian classical
poetry generation model using a transformer architecture on a specialized
dataset with no pretraining. Additionally, we propose a novel decoding method
to enhance coherence and meaningfulness in the generated poetry, effectively
managing the tradeoff between diversity and quality. Furthermore, the results
of our training approach and the proposed decoding method are evaluated through
comprehensive set of automatic and human evaluations and showed its superior
capability to generate coherent and meaningful poetry in compare to other
decoding methods and an existing Persian large language model (LLM).
( 2
min )
Knowledge graph construction (KGC) is a multifaceted undertaking involving
the extraction of entities, relations, and events. Traditionally, large
language models (LLMs) have been viewed as solitary task-solving agents in this
complex landscape. However, this paper challenges this paradigm by introducing
a novel framework, CooperKGC. Departing from the conventional approach,
CooperKGC establishes a collaborative processing network, assembling a KGC
collaboration team capable of concurrently addressing entity, relation, and
event extraction tasks. Our experiments unequivocally demonstrate that
fostering collaboration and information interaction among diverse agents within
CooperKGC yields superior results compared to individual cognitive processes
operating in isolation. Importantly, our findings reveal that the collaboration
facilitated by CooperKGC enhances knowledge selection, correction, and
aggregation capabilities across multiple rounds of interactions.
( 2
min )
Recent research on online Gradient Balancing (GraB) has revealed that there
exist permutation-based example orderings for SGD that are guaranteed to
outperform random reshuffling (RR). Whereas RR arbitrarily permutes training
examples, GraB leverages stale gradients from prior epochs to order examples --
achieving a provably faster convergence rate than RR. However, GraB is limited
by design: while it demonstrates an impressive ability to scale-up training on
centralized data, it does not naturally extend to modern distributed ML
workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which
uses insights from prior work on kernel thinning to translate the benefits of
provably faster permutation-based example ordering to distributed settings.
With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate
over centralized GraB and outperforms distributed RR on a variety of benchmark
tasks.
( 2
min )
When optimizing problems with uncertain parameter values in a linear
objective, decision-focused learning enables end-to-end learning of these
values. We are interested in a stochastic scheduling problem, in which
processing times are uncertain, which brings uncertain values in the
constraints, and thus repair of an initial schedule may be needed. Historical
realizations of the stochastic processing times are available. We show how
existing decision-focused learning techniques based on stochastic smoothing can
be adapted to this scheduling problem. We include an extensive experimental
evaluation to investigate in which situations decision-focused learning
outperforms the state of the art for such situations: scenario-based stochastic
optimization.
( 2
min )
Among the commonly used non-destructive techniques, the Ground Penetrating
Radar (GPR) is one of the most widely adopted today for assessing pavement
conditions in France. However, conventional radar systems and their forward
processing methods have shown their limitations for the physical and
geometrical characterization of very thin layers such as tack coats. However,
the use of Machine Learning methods applied to GPR with an inverse approach
showed that it was numerically possible to identify the tack coat
characteristics despite masking effects due to low timefrequency resolution
noted in the raw B-scans. Thus, we propose in this paper to apply the inverse
approach based on Machine Learning, already validated in previous works on
numerical data, on two experimental cases with different pavement structures.
The first case corresponds to a validation on known pavement structures on the
Gustave Eiffel University (Nantes, France) with its pavement fatigue carousel
and the second case focuses on a new real road in Vend{\'e}e department
(France). In both case studies, the performances of SVM/SVR methods showed the
efficiency of supervised learning methods to classify and estimate the emulsion
proportioning in the tack coats.
( 3
min )
This research introduces a sophisticated transfer learning model based on
Google's MobileNetV2 for breast cancer tumor classification into normal,
benign, and malignant categories, utilizing a dataset of 1576 ultrasound images
(265 normal, 891 benign, 420 malignant). The model achieves an accuracy of
0.82, precision of 0.83, recall of 0.81, ROC-AUC of 0.94, PR-AUC of 0.88, and
MCC of 0.74. It examines image intensity distributions and misclassification
errors, offering improvements for future applications. Addressing dataset
imbalances, the study ensures a generalizable model. This work, using a dataset
from Baheya Hospital, Cairo, Egypt, compiled by Walid Al-Dhabyani et al.,
emphasizes MobileNetV2's potential in medical imaging, aiming to improve
diagnostic precision in oncology. Additionally, the paper explores
Streamlit-based deployment for real-time tumor classification, demonstrating
MobileNetV2's applicability in medical imaging and setting a benchmark for
future research in oncology diagnostics.
( 2
min )
We study the asymptotic generalization of an overparameterized linear model
for multiclass classification under the Gaussian covariates bi-level model
introduced in Subramanian et al.~'22, where the number of data points,
features, and classes all grow together. We fully resolve the conjecture posed
in Subramanian et al.~'22, matching the predicted regimes for generalization.
Furthermore, our new lower bounds are akin to an information-theoretic strong
converse: they establish that the misclassification rate goes to 0 or 1
asymptotically. One surprising consequence of our tight results is that the
min-norm interpolating classifier can be asymptotically suboptimal relative to
noninterpolating classifiers in the regime where the min-norm interpolating
regressor is known to be optimal.
The key to our tight analysis is a new variant of the Hanson-Wright
inequality which is broadly useful for multiclass problems with sparse labels.
As an application, we show that the same type of analysis can be used to
analyze the related multilabel classification problem under the same bi-level
ensemble.
( 2
min )
Recent advances in machine learning, specifically transformer architecture,
have led to significant advancements in commercial domains. These powerful
models have demonstrated superior capability to learn complex relationships and
often generalize better to new data and problems. This paper presents a novel
transformer-powered approach for enhancing prediction accuracy in multi-modal
output scenarios, where sparse experimental data is supplemented with
simulation data. The proposed approach integrates transformer-based
architecture with a novel graph-based hyper-parameter optimization technique.
The resulting system not only effectively reduces simulation bias, but also
achieves superior prediction accuracy compared to the prior method. We
demonstrate the efficacy of our approach on inertial confinement fusion
experiments, where only 10 shots of real-world data are available, as well as
synthetic versions of these experiments.
( 2
min )
This paper engages in a speculative exploration of the concept of an
artificial agent capable of conducting research. Initially, it examines how the
act of research can be conceptually characterized, aiming to provide a starting
point for discussions about what it means to create such agents. The focus then
shifts to the core components of research: question formulation, hypothesis
generation, and hypothesis verification. This discussion includes a
consideration of the potential and challenges associated with enabling machines
to autonomously perform these tasks. Subsequently, this paper briefly considers
the overlapping themes and interconnections that underlie them. Finally, the
paper presents preliminary thoughts on prototyping as an initial step towards
uncovering the challenges involved in developing these research-capable agents.
( 2
min )
In this paper, we propose a dimensionless anomaly detection method for
multivariate streams. Our method is independent of the unit of measurement for
the different stream channels, therefore dimensionless. We first propose the
variance norm, a generalisation of Mahalanobis distance to handle
infinite-dimensional feature space and singular empirical covariance matrix
rigorously. We then combine the variance norm with the path signature, an
infinite collection of iterated integrals that provide global features of
streams, to propose SigMahaKNN, a method for anomaly detection on
(multivariate) streams. We show that SigMahaKNN is invariant to stream
reparametrisation, stream concatenation and has a graded discrimination power
depending on the truncation level of the path signature. We implement
SigMahaKNN as an open-source software, and perform extensive numerical
experiments, showing significantly improved anomaly detection on streams
compared to isolation forest and local outlier factors in applications ranging
from language analysis, hand-writing analysis, ship movement paths analysis and
univariate time-series analysis.
( 2
min )
Algorithms make a growing portion of policy and business decisions. We
develop a treatment-effect estimator using algorithmic decisions as instruments
for a class of stochastic and deterministic algorithms. Our estimator is
consistent and asymptotically normal for well-defined causal effects. A special
case of our setup is multidimensional regression discontinuity designs with
complex boundaries. We apply our estimator to evaluate the Coronavirus Aid,
Relief, and Economic Security Act, which allocated many billions of dollars
worth of relief funding to hospitals via an algorithmic rule. The funding is
shown to have little effect on COVID-19-related hospital activities. Naive
estimates exhibit selection bias.
( 2
min )
There has been a lot of work in question generation where different methods
to provide target answers as input, have been employed. This experimentation
has been mostly carried out for RNN based models. We use three different
methods and their combinations for incorporating answer information and explore
their effect on several automatic evaluation metrics. The methods that are used
are answer prompting, using a custom product method using answer embeddings and
encoder outputs, choosing sentences from the input paragraph that have answer
related information, and using a separate cross-attention attention block in
the decoder which attends to the answer. We observe that answer prompting
without any additional modes obtains the best scores across rouge, meteor
scores. Additionally, we use a custom metric to calculate how many of the
generated questions have the same answer, as the answer which is used to
generate them.
( 2
min )
We present a robust membership inference attack (RMIA) that amplifies the
distinction between population data and the training data on any target model,
by effectively leveraging both reference models and reference data in our
likelihood ratio test. Our algorithm exhibits superior test power
(true-positive rate) when compared to prior methods, even at extremely low
false-positive error rates (as low as 0). Also, under computation constraints,
where only a limited number of reference models (as few as 1) are available,
our method performs exceptionally well, unlike some prior attacks that approach
random guessing in such scenarios. Our method lays the groundwork for
cost-effective and practical yet powerful and robust privacy risk analysis of
machine learning algorithms.
( 2
min )
Among the commonly used non-destructive techniques, the Ground Penetrating
Radar (GPR) is one of the most widely adopted today for assessing pavement
conditions in France. However, conventional radar systems and their forward
processing methods have shown their limitations for the physical and
geometrical characterization of very thin layers such as tack coats. However,
the use of Machine Learning methods applied to GPR with an inverse approach
showed that it was numerically possible to identify the tack coat
characteristics despite masking effects due to low timefrequency resolution
noted in the raw B-scans. Thus, we propose in this paper to apply the inverse
approach based on Machine Learning, already validated in previous works on
numerical data, on two experimental cases with different pavement structures.
The first case corresponds to a validation on known pavement structures on the
Gustave Eiffel University (Nantes, France) with its pavement fatigue carousel
and the second case focuses on a new real road in Vend{\'e}e department
(France). In both case studies, the performances of SVM/SVR methods showed the
efficiency of supervised learning methods to classify and estimate the emulsion
proportioning in the tack coats.
( 3
min )
Algorithms make a growing portion of policy and business decisions. We
develop a treatment-effect estimator using algorithmic decisions as instruments
for a class of stochastic and deterministic algorithms. Our estimator is
consistent and asymptotically normal for well-defined causal effects. A special
case of our setup is multidimensional regression discontinuity designs with
complex boundaries. We apply our estimator to evaluate the Coronavirus Aid,
Relief, and Economic Security Act, which allocated many billions of dollars
worth of relief funding to hospitals via an algorithmic rule. The funding is
shown to have little effect on COVID-19-related hospital activities. Naive
estimates exhibit selection bias.
( 2
min )
In this paper, we propose a dimensionless anomaly detection method for
multivariate streams. Our method is independent of the unit of measurement for
the different stream channels, therefore dimensionless. We first propose the
variance norm, a generalisation of Mahalanobis distance to handle
infinite-dimensional feature space and singular empirical covariance matrix
rigorously. We then combine the variance norm with the path signature, an
infinite collection of iterated integrals that provide global features of
streams, to propose SigMahaKNN, a method for anomaly detection on
(multivariate) streams. We show that SigMahaKNN is invariant to stream
reparametrisation, stream concatenation and has a graded discrimination power
depending on the truncation level of the path signature. We implement
SigMahaKNN as an open-source software, and perform extensive numerical
experiments, showing significantly improved anomaly detection on streams
compared to isolation forest and local outlier factors in applications ranging
from language analysis, hand-writing analysis, ship movement paths analysis and
univariate time-series analysis.
( 2
min )
In causal models, a given mechanism is assumed to be invariant to changes of
other mechanisms. While this principle has been utilized for inference in
settings where the causal variables are observed, theoretical insights when the
variables of interest are latent are largely missing. We assay the connection
between invariance and causal representation learning by establishing
impossibility results which show that invariance alone is insufficient to
identify latent causal variables. Together with practical considerations, we
use these theoretical findings to highlight the need for additional constraints
in order to identify representations by exploiting invariance.
( 2
min )
Associated to each graph G is a Gaussian graphical model. Such models are
often used in high-dimensional settings, i.e. where there are relatively few
data points compared to the number of variables. The maximum likelihood
threshold of a graph is the minimum number of data points required to fit the
corresponding graphical model using maximum likelihood estimation. Graphical
lasso is a method for selecting and fitting a graphical model. In this project,
we ask: when graphical lasso is used to select and fit a graphical model on n
data points, how likely is it that n is greater than or equal to the maximum
likelihood threshold of the corresponding graph? Our results are a series of
computational experiments.
( 2
min )
The partially observable constrained optimization problems (POCOPs) impede
data-driven optimization techniques since an infeasible solution of POCOPs can
provide little information about the objective as well as the constraints. We
endeavor to design an efficient and provable method for expensive POCOPs under
the framework of constrained Bayesian optimization. Our method consists of two
key components. Firstly, we present an improved design of the acquisition
functions that introduces balanced exploration during optimization. We
rigorously study the convergence properties of this design to demonstrate its
effectiveness. Secondly, we propose a Gaussian process embedding different
likelihoods as the surrogate model for a partially observable constraint. This
model leads to a more accurate representation of the feasible regions compared
to traditional classification-based models. Our proposed method is empirically
studied on both synthetic and real-world problems. The results demonstrate the
competitiveness of our method for solving POCOPs.
( 2
min )
The central problem in materials science is to discover materials with desired properties. MatterGen enables broad property-guided materials design.
The post MatterGen: Property-guided materials design appeared first on Microsoft Research.
( 8
min )
Advanced prompting technologies for LLMs can lead to excessively long prompts, causing issues. Learn how LLMLingua compresses prompts up to 20x, maintaining quality, reducing latency, and supporting improved UX.
The post LLMLingua: Innovating LLM efficiency with prompt compression appeared first on Microsoft Research.
( 10
min )
Accessibility is a key element that all designers must consider before constructing a space or product — but the evaluation process has traditionally been tedious and time-consuming. Mathew Schwartz, an assistant professor in architecture and design at the New Jersey Institute of Technology, is using the NVIDIA Omniverse platform and the Universal Scene Description framework, Read article >
( 7
min )
It’s a fortuitous GFN Thursday with 17 new games joining the GeForce NOW library, including The Day Before, Avatar: Frontiers of Pandora and the 100th PC Game Pass title to join the cloud — Ori and the Will of the Wisps. This week also marks a milestone: over 500 games and applications now support RTX Read article >
( 8
min )
This is a guest post co-authored by Nafi Ahmet Turgut, Mehmet İkbal Özmen, Hasan Burak Yel, Fatma Nur Dumlupınar Keşir, Mutlu Polatcan and Emre Uzel from Getir. Getir is the pioneer of ultrafast grocery delivery. The technology company has revolutionized last-mile delivery with its grocery in-minutes delivery proposition. Getir was founded in 2015 and operates […]
( 8
min )
Using machine learning, the computational method can provide details of how materials work as catalysts, semiconductors, or battery components.
( 11
min )
Double descent presents a counter-intuitive aspect within the machine
learning domain, and researchers have observed its manifestation in various
models and tasks. While some theoretical explanations have been proposed for
this phenomenon in specific contexts, an accepted theory to account for its
occurrence in deep learning remains yet to be established. In this study, we
revisit the phenomenon of double descent and demonstrate that its occurrence is
strongly influenced by the presence of noisy data. Through conducting a
comprehensive analysis of the feature space of learned representations, we
unveil that double descent arises in imperfect models trained with noisy data.
We argue that double descent is a consequence of the model first learning the
noisy data until interpolation and then adding implicit regularization via
over-parameterization acquiring therefore capability to separate the
information from the noise.
( 2
min )
Adopting reasonable strategies is challenging but crucial for an intelligent
agent with limited resources working in hazardous, unstructured, and dynamic
environments to improve the system's utility, decrease the overall cost, and
increase mission success probability. This paper proposes a novel directed
acyclic strategy graph decomposition approach based on Bayesian chaining to
separate an intricate policy into several simple sub-policies and organize
their relationships as Bayesian strategy networks (BSN). We integrate this
approach into the state-of-the-art DRL method -- soft actor-critic (SAC), and
build the corresponding Bayesian soft actor-critic (BSAC) model by organizing
several sub-policies as a joint policy. We compare our method against the
state-of-the-art deep reinforcement learning algorithms on the standard
continuous control benchmarks in the OpenAI Gym environment. The results
demonstrate that the promising potential of the BSAC method significantly
improves training efficiency.
( 2
min )
Computational pathology models rarely utilise data that will not be available
for inference. This means most models cannot learn from highly informative data
such as additional immunohistochemical (IHC) stains and spatial
transcriptomics. We present TriDeNT, a novel self-supervised method for
utilising privileged data that is not available during inference to improve
performance. We demonstrate the efficacy of this method for a range of
different paired data including immunohistochemistry, spatial transcriptomics
and expert nuclei annotations. In all settings, TriDeNT outperforms other
state-of-the-art methods in downstream tasks, with observed improvements of up
to 101%. Furthermore, we provide qualitative and quantitative measurements of
the features learned by these models and how they differ from baselines.
TriDeNT offers a novel method to distil knowledge from scarce or costly data
during training, to create significantly better models for routine inputs.
( 2
min )
Guaranteeing safe behaviour of reinforcement learning (RL) policies poses
significant challenges for safety-critical applications, despite RL's
generality and scalability. To address this, we propose a new approach to apply
verification methods from control theory to learned value functions. By
analyzing task structures for safety preservation, we formalize original
theorems that establish links between value functions and control barrier
functions. Further, we propose novel metrics for verifying value functions in
safe control tasks and practical implementation details to improve learning.
Our work presents a novel method for certificate learning, which unlocks a
diversity of verification techniques from control theory for RL policies, and
marks a significant step towards a formal framework for the general, scalable,
and verifiable design of RL-based control systems. Code and videos are
available at this https url: https://rl-cbf.github.io/
( 2
min )
Physics-informed neural networks (PINNs) constitute a flexible approach to
both finding solutions and identifying parameters of partial differential
equations. Most works on the topic assume noiseless data, or data contaminated
with weak Gaussian noise. We show that the standard PINN framework breaks down
in case of non-Gaussian noise. We give a way of resolving this fundamental
issue and we propose to jointly train an energy-based model (EBM) to learn the
correct noise distribution. We illustrate the improved performance of our
approach using multiple examples.
( 2
min )
In this paper, we prove that an Adam-type algorithm with smooth clipping
approaches the global minimizer of the regularized non-convex loss function.
Adding smooth clipping and taking the state space as the set of all
trajectories, we can apply the ergodic theory of Markov semigroups for this
algorithm and investigate its asymptotic behavior. The ergodic theory we
establish in this paper reduces the problem of evaluating the convergence,
generalization error and discretization error of this algorithm to the problem
of evaluating the difference between two functional stochastic differential
equations (SDEs) with different drift coefficients. As a result of our
analysis, we have shown that this algorithm minimizes the the regularized
non-convex loss function with errors of the form $n^{-1/2}$, $\eta^{1/4}$,
$\beta^{-1} \log (\beta + 1)$ and $e^{- c t}$. Here, $c$ is a constant and $n$,
$\eta$, $\beta$ and $t$ denote the size of the training dataset, learning rate,
inverse temperature and time, respectively.
( 2
min )
Knowledge tracing consists in predicting the performance of some students on
new questions given their performance on previous questions, and can be a prior
step to optimizing assessment and learning. Deep knowledge tracing (DKT) is a
competitive model for knowledge tracing relying on recurrent neural networks,
even if some simpler models may match its performance. However, little is known
about why DKT works so well. In this paper, we frame deep knowledge tracing as
a encoderdecoder architecture. This viewpoint not only allows us to propose
better models in terms of performance, simplicity or expressivity but also
opens up promising avenues for future research directions. In particular, we
show on several small and large datasets that a simpler decoder, with possibly
fewer parameters than the one used by DKT, can predict student performance
better.
( 2
min )
Deep Learning(DL) and Machine Learning(ML) applications are rapidly
increasing in recent days. Massive amounts of data are being generated over the
internet which can derive meaningful results by the use of ML and DL
algorithms. Hardware resources and open-source libraries have made it easy to
implement these algorithms. Tensorflow and Pytorch are one of the leading
frameworks for implementing ML projects. By using those frameworks, we can
trace the operations executed on both GPU and CPU to analyze the resource
allocations and consumption. This paper presents the time and memory allocation
of CPU and GPU while training deep neural networks using Pytorch. This paper
analysis shows that GPU has a lower running time as compared to CPU for deep
neural networks. For a simpler network, there are not many significant
improvements in GPU over the CPU.
( 2
min )
The effectiveness of a model is heavily reliant on the quality of the fusion
representation of multiple modalities in multimodal sentiment analysis.
Moreover, each modality is extracted from raw input and integrated with the
rest to construct a multimodal representation. Although previous methods have
proposed multimodal representations and achieved promising results, most of
them focus on forming positive and negative pairs, neglecting the variation in
sentiment scores within the same class. Additionally, they fail to capture the
significance of unimodal representations in the fusion vector. To address these
limitations, we introduce a framework called Supervised Angular-based
Contrastive Learning for Multimodal Sentiment Analysis. This framework aims to
enhance discrimination and generalizability of the multimodal representation
and overcome biases in the fusion vector's modality. Our experimental results,
along with visualizations on two widely used datasets, demonstrate the
effectiveness of our approach.
( 2
min )
We discuss the fundamental issue of identification in linear instrumental
variable (IV) models with unknown IV validity. With the assumption of the
"sparsest rule", which is equivalent to the plurality rule but becomes
operational in computation algorithms, we investigate and prove the advantages
of non-convex penalized approaches over other IV estimators based on two-step
selections, in terms of selection consistency and accommodation for
individually weak IVs. Furthermore, we propose a surrogate sparsest penalty
that aligns with the identification condition and provides oracle sparse
structure simultaneously. Desirable theoretical properties are derived for the
proposed estimator with weaker IV strength conditions compared to the previous
literature. Finite sample properties are demonstrated using simulations and the
selection and estimation method is applied to an empirical study concerning the
effect of BMI on diastolic blood pressure.
( 2
min )
Most neural compression models are trained on large datasets of images or
videos in order to generalize to unseen data. Such generalization typically
requires large and expressive architectures with a high decoding complexity.
Here we introduce C3, a neural compression method with strong rate-distortion
(RD) performance that instead overfits a small model to each image or video
separately. The resulting decoding complexity of C3 can be an order of
magnitude lower than neural baselines with similar RD performance. C3 builds on
COOL-CHIC (Ladune et al.) and makes several simple and effective improvements
for images. We further develop new methodology to apply C3 to videos. On the
CLIC2020 image benchmark, we match the RD performance of VTM, the reference
implementation of the H.266 codec, with less than 3k MACs/pixel for decoding.
On the UVG video benchmark, we match the RD performance of the Video
Compression Transformer (Mentzer et al.), a well-established neural video
codec, with less than 5k MACs/pixel for decoding.
( 2
min )
This paper presents a method for finding a sparse representation of Barron
functions. Specifically, given an $L^2$ function $f$, the inverse scale space
flow is used to find a sparse measure $\mu$ minimising the $L^2$ loss between
the Barron function associated to the measure $\mu$ and the function $f$. The
convergence properties of this method are analysed in an ideal setting and in
the cases of measurement noise and sampling bias. In an ideal setting the
objective decreases strictly monotone in time to a minimizer with
$\mathcal{O}(1/t)$, and in the case of measurement noise or sampling bias the
optimum is achieved up to a multiplicative or additive constant. This
convergence is preserved on discretization of the parameter space, and the
minimizers on increasingly fine discretizations converge to the optimum on the
full parameter space.
( 2
min )
Physics-informed neural networks (PINNs) constitute a flexible approach to
both finding solutions and identifying parameters of partial differential
equations. Most works on the topic assume noiseless data, or data contaminated
with weak Gaussian noise. We show that the standard PINN framework breaks down
in case of non-Gaussian noise. We give a way of resolving this fundamental
issue and we propose to jointly train an energy-based model (EBM) to learn the
correct noise distribution. We illustrate the improved performance of our
approach using multiple examples.
( 2
min )
The Street View House Numbers (SVHN) dataset is a popular benchmark dataset
in deep learning. Originally designed for digit classification tasks, the SVHN
dataset has been widely used as a benchmark for various other tasks including
generative modeling. However, with this work, we aim to warn the community
about an issue of the SVHN dataset as a benchmark for generative modeling
tasks: we discover that the official split into training set and test set of
the SVHN dataset are not drawn from the same distribution. We empirically show
that this distribution mismatch has little impact on the classification task
(which may explain why this issue has not been detected before), but it
severely affects the evaluation of probabilistic generative models, such as
Variational Autoencoders and diffusion models. As a workaround, we propose to
mix and re-split the official training and test set when SVHN is used for tasks
other than classification. We publish a new split and the indices we used to
create it at https://jzenn.github.io/svhn-remix/ .
( 2
min )
Toronto Pearson International Airport, in Ontario, Canada, is the country’s largest and busiest airport, serving some 50 million passengers each year. To enhance traveler experiences, the airport in June deployed the Zensors AI platform, which uses anonymized footage from existing security cameras to generate spatial data that helps optimize operations in real time. A member Read article >
( 7
min )
Move over, Merriam-Webster: Enterprises this year found plenty of candidates to add for word of the year. “Generative AI” and “generative pretrained transformer” were followed by terms such as “large language models” and “retrieval-augmented generation” (RAG) as whole industries turned their attention to transformative new technologies. Generative AI started the year as a blip on Read article >
( 17
min )
A new era of autonomous vehicle technology, known as AV 2.0, has emerged, marked by large, unified AI models that can control multiple parts of the vehicle stack, from perception and planning to control. Wayve, a London-based autonomous driving technology company, is leading the surf. In the latest episode of NVIDIA’s AI Podcast, host Katie Read article >
( 6
min )
Despite the seemingly unstoppable adoption of LLMs across industries, they are one component of a broader technology ecosystem that is powering the new AI wave. Many conversational AI use cases require LLMs like Llama 2, Flan T5, and Bloom to respond to user queries. These models rely on parametric knowledge to answer questions. The model […]
( 11
min )
Summarization is the technique of condensing sizable information into a compact and meaningful form, and stands as a cornerstone of efficient communication in our information-rich age. In a world full of data, summarizing long texts into brief summaries saves time and helps make informed decisions. Summarization condenses content, saving time and improving clarity by presenting […]
( 13
min )
Conversational AI has come a long way in recent years thanks to the rapid developments in generative AI, especially the performance improvements of large language models (LLMs) introduced by training techniques such as instruction fine-tuning and reinforcement learning from human feedback. When prompted correctly, these models can carry coherent conversations without any task-specific training data. […]
( 18
min )
This post is co-written with Stanislav Yeshchenko from Q4 Inc. Enterprises turn to Retrieval Augmented Generation (RAG) as a mainstream approach to building Q&A chatbots. We continue to see emerging challenges stemming from the nature of the assortment of datasets available. These datasets are often a mix of numerical and text data, at times structured, […]
( 18
min )
Explore the latest AI innovations aiming to advance the software development lifecycle. AdaptivePaste adapts and refines pasted code snippets in an IDE. InferFix automates bug detection and repair. Discover how.
The post Microsoft at ESEC/FSE 2023: AI techniques for a streamlined coding workflow appeared first on Microsoft Research.
( 10
min )
Research Focus: Using LLMs in a Rust-based formal verification framework; Rethinking network measurements with user feedback; 3D telemedicine using HoloportationTM communication technology could enhance overseas surgical visits.
The post Research Focus: Week of December 4, 2023 appeared first on Microsoft Research.
( 9
min )
During 18 years of leadership, Evans established new R&D mission areas, strengthened ties to the MIT community, and increased inclusion and education efforts.
( 11
min )
The data-driven approach to robot control has been gathering pace rapidly,
yet generalization to unseen task domains remains a critical challenge. We
argue that the key to generalization is representations that are (i) rich
enough to capture all task-relevant information and (ii) invariant to
superfluous variability between the training and the test domains. We
experimentally study such a representation -- containing both depth and
semantic information -- for visual navigation and show that it enables a
control policy trained entirely in simulated indoor scenes to generalize to
diverse real-world environments, both indoors and outdoors. Further, we show
that our representation reduces the A-distance between the training and test
domains, improving the generalization error bound as a result. Our proposed
approach is scalable: the learned policy improves continuously, as the
foundation models that it exploits absorb more diverse data during
pre-training.
( 2
min )
Denoising is intuitively related to projection. Indeed, under the manifold
hypothesis, adding random noise is approximately equivalent to orthogonal
perturbation. Hence, learning to denoise is approximately learning to project.
In this paper, we use this observation to reinterpret denoising diffusion
models as approximate gradient descent applied to the Euclidean distance
function. We then provide straight-forward convergence analysis of the DDIM
sampler under simple assumptions on the projection-error of the denoiser.
Finally, we propose a new sampler based on two simple modifications to DDIM
using insights from our theoretical results. In as few as 5-10 function
evaluations, our sampler achieves state-of-the-art FID scores on pretrained
CIFAR-10 and CelebA models and can generate high quality samples on latent
diffusion models.
( 2
min )
This paper proposes a multiblock alternating direction method of multipliers
for solving a class of multiblock nonsmooth nonconvex optimization problem with
nonlinear coupling constraints. We employ a majorization minimization procedure
in the update of each block of the primal variables. Subsequential and global
convergence of the generated sequence to a critical point of the augmented
Lagrangian are proved. We also establish iteration complexity and provide
preliminary numerical results for the proposed algorithm.
( 2
min )
Signal Temporal Logic (STL) is a powerful framework for describing the
complex temporal and logical behaviour of the dynamical system. Numerous
studies have attempted to employ reinforcement learning to learn a controller
that enforces STL specifications; however, they have been unable to effectively
tackle the challenges of ensuring robust satisfaction in continuous state space
and maintaining tractability. In this paper, leveraging the concept of funnel
functions, we propose a tractable reinforcement learning algorithm to learn a
time-dependent policy for robust satisfaction of STL specification in
continuous state space. We demonstrate the utility of our approach on several
STL tasks using different environments.
( 2
min )
Hippocampal atrophy in Alzheimer's disease (AD) is asymmetric and spatially
inhomogeneous. While extensive work has been done on volume and shape analysis
of atrophy of the hippocampus in AD, less attention has been given to
hippocampal asymmetry specifically. Previous studies of hippocampal asymmetry
are limited to global volume or shape measures, which don't localize shape
asymmetry at the point level. In this paper, we propose to quantify localized
shape asymmetry by optimizing point correspondences between left and right
hippocampi within a subject, while simultaneously favoring a compact
statistical shape model of the entire sample. To account for related variables
that have impact on AD and healthy subject differences, we build linear models
with other confounding factors. Our results on the OASIS3 dataset demonstrate
that compared to using volumetric information, shape asymmetry reveals
fine-grained, localized differences that indicate the hippocampal regions of
most significant shape asymmetry in AD patients.
( 2
min )
This work introduces BRILLsson, a novel binary neural network-based
representation learning model for a broad range of non-semantic speech tasks.
We train the model with knowledge distillation from a large and real-valued
TRILLsson model with only a fraction of the dataset used to train TRILLsson.
The resulting BRILLsson models are only 2MB in size with a latency less than
8ms, making them suitable for deployment in low-resource devices such as
wearables. We evaluate BRILLsson on eight benchmark tasks (including but not
limited to spoken language identification, emotion recognition, health
condition diagnosis, and keyword spotting), and demonstrate that our proposed
ultra-light and low-latency models perform as well as large-scale models.
( 2
min )
This paper proposes a weakly-supervised machine learning-based approach
aiming at a tool to alert patients about possible respiratory diseases. Various
types of pathologies may affect the respiratory system, potentially leading to
severe diseases and, in certain cases, death. In general, effective prevention
practices are considered as major actors towards the improvement of the
patient's health condition. The proposed method strives to realize an easily
accessible tool for the automatic diagnosis of respiratory diseases.
Specifically, the method leverages Variational Autoencoder architectures
permitting the usage of training pipelines of limited complexity and relatively
small-sized datasets. Importantly, it offers an accuracy of 57 %, which is in
line with the existing strongly-supervised approaches.
( 2
min )
Information Extraction (IE) seeks to derive structured information from
unstructured texts, often facing challenges in low-resource scenarios due to
data scarcity and unseen classes. This paper presents a review of neural
approaches to low-resource IE from \emph{traditional} and \emph{LLM-based}
perspectives, systematically categorizing them into a fine-grained taxonomy.
Then we conduct empirical study on LLM-based methods compared with previous
state-of-the-art models, and discover that (1) well-tuned LMs are still
predominant; (2) tuning open-resource LLMs and ICL with GPT family is promising
in general; (3) the optimal LLM-based technical solution for low-resource IE
can be task-dependent. In addition, we discuss low-resource IE with LLMs,
highlight promising applications, and outline potential research directions.
This survey aims to foster understanding of this field, inspire new ideas, and
encourage widespread applications in both academia and industry.
( 2
min )
Since ChatGPT works so well, are we on the cusp of solving science with AI?
Is not AlphaFold2 suggestive that the potential of LLMs in biology and the
sciences more broadly is limitless? Can we use AI itself to bridge the lack of
data in the sciences in order to then train an AI? Herein we present a
discussion of these topics.
( 2
min )
When visualizing a high-dimensional dataset, dimension reduction techniques
are commonly employed which provide a single 2 dimensional view of the data. We
describe ENS-t-SNE: an algorithm for Embedding Neighborhoods Simultaneously
that generalizes the t-Stochastic Neighborhood Embedding approach. By using
different viewpoints in ENS-t-SNE's 3D embedding, one can visualize different
types of clusters within the same high-dimensional dataset. This enables the
viewer to see and keep track of the different types of clusters, which is
harder to do when providing multiple 2D embeddings, where corresponding points
cannot be easily identified. We illustrate the utility of ENS-t-SNE with
real-world applications and provide an extensive quantitative evaluation with
datasets of different types and sizes.
( 2
min )
Traditional partial differential equation (PDE) solvers can be
computationally expensive, which motivates the development of faster methods,
such as reduced-order-models (ROMs). We present GPLaSDI, a hybrid deep-learning
and Bayesian ROM. GPLaSDI trains an autoencoder on full-order-model (FOM) data
and simultaneously learns simpler equations governing the latent space. These
equations are interpolated with Gaussian Processes, allowing for uncertainty
quantification and active learning, even with limited access to the FOM solver.
Our framework is able to achieve up to 100,000 times speed-up and less than 7%
relative error on fluid mechanics problems.
( 2
min )
Training neural networks that require adversarial optimization, such as
generative adversarial networks (GANs) and unsupervised domain adaptations
(UDAs), suffers from instability. This instability problem comes from the
difficulty of the minimax optimization, and there have been various approaches
in GANs and UDAs to overcome this problem. In this study, we tackle this
problem theoretically through a functional analysis. Specifically, we show the
convergence property of the minimax problem by the gradient descent over the
infinite-dimensional spaces of continuous functions and probability measures
under certain conditions. Using this setting, we can discuss GANs and UDAs
comprehensively, which have been studied independently. In addition, we show
that the conditions necessary for the convergence property are interpreted as
stabilization techniques of adversarial training such as the spectral
normalization and the gradient penalty.
( 2
min )
Normative models in neuroimaging learn the brain patterns of healthy
population distribution and estimate how disease subjects like Alzheimer's
Disease (AD) deviate from the norm. Existing variational autoencoder
(VAE)-based normative models using multimodal neuroimaging data aggregate
information from multiple modalities by estimating product or averaging of
unimodal latent posteriors. This can often lead to uninformative joint latent
distributions which affects the estimation of subject-level deviations. In this
work, we addressed the prior limitations by adopting the
Mixture-of-Product-of-Experts (MoPoE) technique which allows better modelling
of the joint latent posterior. Our model labelled subjects as outliers by
calculating deviations from the multimodal latent space. Further, we identified
which latent dimensions and brain regions were associated with abnormal
deviations due to AD pathology.
( 2
min )
In 2023, online payment fraud cost the world US$48 billion. Businesses prioritize fighting payment fraud and minimizing its financial and reputational damage. In addition to monetary losses, payment fraud can damage a customer’s trust and loyalty, as well as increase the scrutiny from regulators and law enforcement. Organizations use machine learning to combat this growing… Read More »Decoding the Future: The Intersection of Advanced Analytics and Fraud Prevention in Revolutionizing Digital Payments
The post Decoding the Future: The Intersection of Advanced Analytics and Fraud Prevention in Revolutionizing Digital Payments appeared first on Data Science Central.
( 22
min )
Large language model (LLM) training has become increasingly popular over the last year with the release of several publicly available models such as Llama2, Falcon, and StarCoder. Customers are now training LLMs of unprecedented size ranging from 1 billion to over 175 billion parameters. Training these LLMs requires significant compute resources and time as hundreds […]
( 8
min )
Structured data, defined as data following a fixed pattern such as information stored in columns within databases, and unstructured data, which lacks a specific form or pattern like text, images, or social media posts, both continue to grow as they are produced and consumed by various organizations. For instance, according to International Data Corporation (IDC), […]
( 13
min )
The post describes how you can overcome the challenges of retaining data ownership and preserving data privacy while using LLMs by deploying Protopia AI’s Stained Glass Transform to protect your data. Protopia AI has partnered with AWS to deliver the critical component of data protection and ownership for secure and efficient enterprise adoption of generative AI. This post outlines the solution and demonstrates how it can be used in AWS for popular enterprise use cases like Retrieval Augmented Generation (RAG) and with state-of-the-art LLMs like Llama 2.
( 12
min )
Many patients in low- and middle-income countries rely on facilitated online health communities for information and support. Discover how large language models can assist the facilitators and boost outcomes.
The post Exploring LLMs’ potential to help facilitators enhance online healthcare communities appeared first on Microsoft Research.
( 10
min )
Cecily Morrison and Karolina Pakėnaitė are collaborators on a research prototype designed to help members of the blind community find their personal items. Learn how the work is advancing an approach to empower people to shape their own AI experiences.
The post Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė appeared first on Microsoft Research.
( 28
min )
‘Tis the season for friends, family and beautifully rendered Santa animations from this week’s In the NVIDIA Studio artist, 3D expert Božo Balov.
( 7
min )
A new, data-driven approach could lead to better solutions for tricky optimization problems like global package routing or power grid operation.
( 9
min )
Based on the standard VMAF implementation we propose an implementation of
VMAF using PyTorch framework. For this implementation comparisons with the
standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We
investigate gradients computation when using VMAF as an objective function and
demonstrate that training using this function does not result in ill-behaving
gradients. The implementation is then used to train a preprocessing filter. It
is demonstrated that its performance is superior to the unsharp masking filter.
The resulting filter is also easy for implementation and can be applied in
video processing tasks for video copression improvement. This is confirmed by
the results of numerical experiments.
( 2
min )
We consider a setting where a population of artificial learners is given, and
the objective is to optimize aggregate measures of performance, under
constraints on training resources. The problem is motivated by the study of
peer learning in human educational systems. In this context, we study natural
knowledge diffusion processes in networks of interacting artificial learners.
By `natural', we mean processes that reflect human peer learning where the
students' internal state and learning process is mostly opaque, and the main
degree of freedom lies in the formation of peer learning groups by a
coordinator who can potentially evaluate the learners before assigning them to
peer groups. Among else, we empirically show that such processes indeed make
effective use of the training resources, and enable the design of modular
neural models that have the capacity to generalize without being prone to
overfitting noisy labels.
( 2
min )
In this paper we consider the numerical solution to the soft-margin support
vector machine optimization problem. This problem is typically solved using the
SMO algorithm, given the high computational complexity of traditional
optimization algorithms when dealing with large-scale kernel matrices. In this
work, we propose employing an NFFT-accelerated matrix-vector product using an
ANOVA decomposition for the feature space that is used within an interior point
method for the overall optimization problem. As this method requires the
solution of a linear system of saddle point form we suggest a preconditioning
approach that is based on low-rank approximations of the kernel matrix together
with a Krylov subspace solver. We compare the accuracy of the ANOVA-based
kernel with the default LIBSVM implementation. We investigate the performance
of the different preconditioners as well as the accuracy of the ANOVA kernel on
several large-scale datasets.
( 2
min )
In this paper, we aim to explore the use of uplink semantic communications
with the assistance of UAV in order to improve data collection effiicency for
metaverse users in remote areas. To reduce the time for uplink data collection
while balancing the trade-off between reconstruction quality and computational
energy cost, we propose a hybrid action reinforcement learning (RL) framework
to make decisions on semantic model scale, channel allocation, transmission
power, and UAV trajectory. The variables are classified into discrete type and
continuous type, which are optimized by two different RL agents to generate the
combined action. Simulation results indicate that the proposed hybrid action
reinforcement learning framework can effectively improve the efficiency of
uplink semantic data collection under different parameter settings and
outperforms the benchmark scenarios.
( 2
min )
Bug reports are an essential aspect of software development, and it is
crucial to identify and resolve them quickly to ensure the consistent
functioning of software systems. Retrieving similar bug reports from an
existing database can help reduce the time and effort required to resolve bugs.
In this paper, we compared the effectiveness of semantic textual similarity
methods for retrieving similar bug reports based on a similarity score. We
explored several embedding models such as TF-IDF (Baseline), FastText, Gensim,
BERT, and ADA. We used the Software Defects Data containing bug reports for
various software projects to evaluate the performance of these models. Our
experimental results showed that BERT generally outperformed the rest of the
models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our
study provides insights into the effectiveness of different embedding methods
for retrieving similar bug reports and highlights the impact of selecting the
appropriate one for this task. Our code is available on GitHub.
( 2
min )
Extracting the rules of real-world multi-agent behaviors is a current
challenge in various scientific and engineering fields. Biological agents
independently have limited observation and mechanical constraints; however,
most of the conventional data-driven models ignore such assumptions, resulting
in lack of biological plausibility and model interpretability for behavioral
analyses. Here we propose sequential generative models with partial observation
and mechanical constraints in a decentralized manner, which can model agents'
cognition and body dynamics, and predict biologically plausible behaviors. We
formulate this as a decentralized multi-agent imitation-learning problem,
leveraging binary partial observation and decentralized policy models based on
hierarchical variational recurrent neural networks with physical and
biomechanical penalties. Using real-world basketball and soccer datasets, we
show the effectiveness of our method in terms of the constraint violations,
long-term trajectory prediction, and partial observation. Our approach can be
used as a multi-agent simulator to generate realistic trajectories using
real-world data.
( 2
min )
The Shapley value is widely regarded as a trustworthy attribution metric.
However, when people use Shapley values to explain the attribution of input
variables of a deep neural network (DNN), it usually requires a very high
computational cost to approximate relatively accurate Shapley values in
real-world applications. Therefore, we propose a novel network architecture,
the HarsanyiNet, which makes inferences on the input sample and simultaneously
computes the exact Shapley values of the input variables in a single forward
propagation. The HarsanyiNet is designed on the theoretical foundation that the
Shapley value can be reformulated as the redistribution of Harsanyi
interactions encoded by the network.
( 2
min )
Learning disentangled causal representations is a challenging problem that
has gained significant attention recently due to its implications for
extracting meaningful information for downstream tasks. In this work, we define
a new notion of causal disentanglement from the perspective of independent
causal mechanisms. We propose ICM-VAE, a framework for learning causally
disentangled representations supervised by causally related observed labels. We
model causal mechanisms using learnable flow-based diffeomorphic functions to
map noise variables to latent causal variables. Further, to promote the
disentanglement of causal factors, we propose a causal disentanglement prior
that utilizes the known causal structure to encourage learning a causally
factorized distribution in the latent space. Under relatively mild conditions,
we provide theoretical results showing the identifiability of causal factors
and mechanisms up to permutation and elementwise reparameterization. We
empirically demonstrate that our framework induces highly disentangled causal
factors, improves interventional robustness, and is compatible with
counterfactual generation.
( 2
min )
Empirical studies have widely demonstrated that neural networks are highly
sensitive to small, adversarial perturbations of the input. The worst-case
robustness against these so-called adversarial examples can be quantified by
the Lipschitz constant of the neural network. In this paper, we study upper and
lower bounds for the Lipschitz constant of random ReLU neural networks.
Specifically, we assume that the weights and biases follow a generalization of
the He initialization, where general symmetric distributions for the biases are
permitted. For shallow neural networks, we characterize the Lipschitz constant
up to an absolute numerical constant. For deep networks with fixed depth and
sufficiently large width, our established bounds differ by a factor that is
logarithmic in the width.
( 2
min )
In this paper, we put forth a novel framework (named ``RYU'') for the
construction of ``safe'' balls, i.e. regions that provably contain the dual
solution of a target optimization problem. We concentrate on the standard setup
where the cost function is the sum of two terms: a closed, proper, convex
Lipschitz-smooth function and a closed, proper, convex function. The RYU
framework is shown to generalize or improve upon all the results proposed in
the last decade for the considered family of optimization problems.
( 2
min )
Graph contrastive learning has shown great promise when labeled data is
scarce, but large unlabeled datasets are available. However, it often does not
take uncertainty estimation into account. We show that a variational Bayesian
neural network approach can be used to improve not only the uncertainty
estimates but also the downstream performance on semi-supervised
node-classification tasks. Moreover, we propose a new measure of uncertainty
for contrastive learning, that is based on the disagreement in likelihood due
to different positive samples.
( 2
min )
We present an efficient parameter-free approach for statistical learning from
corrupted training sets. We identify corrupted and non-corrupted samples using
latent Bernoulli variables, and therefore formulate the robust learning problem
as maximization of the likelihood where latent variables are marginalized out.
The resulting optimization problem is solved via variational inference using an
efficient Expectation-Maximization based method. The proposed approach improves
over the state-of-the-art by automatically inferring the corruption level and
identifying outliers, while adding minimal computational overhead. We
demonstrate our robust learning method on a wide variety of machine learning
tasks including online learning and deep learning where it exhibits ability to
adapt to different levels of noise and attain high prediction accuracy.
( 2
min )
Canonical Correlation Analysis (CCA) has been widely applied to jointly embed
multiple views of data in a maximally correlated latent space. However, the
alignment between various data perspectives, which is required by traditional
approaches, is unclear in many practical cases. In this work we propose a new
framework Aligned Canonical Correlation Analysis (ACCA), to address this
challenge by iteratively solving the alignment and multi-view embedding.
( 2
min )
This paper elucidates the challenges and opportunities inherent in
integrating data-driven methodologies into geotechnics, drawing inspiration
from the success of materials informatics. Highlighting the intricacies of soil
complexity, heterogeneity, and the lack of comprehensive data, the discussion
underscores the pressing need for community-driven database initiatives and
open science movements. By leveraging the transformative power of deep
learning, particularly in feature extraction from high-dimensional data and the
potential of transfer learning, we envision a paradigm shift towards a more
collaborative and innovative geotechnics field. The paper concludes with a
forward-looking stance, emphasizing the revolutionary potential brought about
by advanced computational tools like large language models in reshaping
geotechnics informatics.
( 2
min )
This paper aims to define, quantify, and analyze the feature complexity that
is learned by a DNN. We propose a generic definition for the feature
complexity. Given the feature of a certain layer in the DNN, our method
disentangles feature components of different complexity orders from the
feature. We further design a set of metrics to evaluate the reliability, the
effectiveness, and the significance of over-fitting of these feature
components. Furthermore, we successfully discover a close relationship between
the feature complexity and the performance of DNNs. As a generic mathematical
tool, the feature complexity and the proposed metrics can also be used to
analyze the success of network compression and knowledge distillation.
( 2
min )
Extracting the rules of real-world multi-agent behaviors is a current
challenge in various scientific and engineering fields. Biological agents
independently have limited observation and mechanical constraints; however,
most of the conventional data-driven models ignore such assumptions, resulting
in lack of biological plausibility and model interpretability for behavioral
analyses. Here we propose sequential generative models with partial observation
and mechanical constraints in a decentralized manner, which can model agents'
cognition and body dynamics, and predict biologically plausible behaviors. We
formulate this as a decentralized multi-agent imitation-learning problem,
leveraging binary partial observation and decentralized policy models based on
hierarchical variational recurrent neural networks with physical and
biomechanical penalties. Using real-world basketball and soccer datasets, we
show the effectiveness of our method in terms of the constraint violations,
long-term trajectory prediction, and partial observation. Our approach can be
used as a multi-agent simulator to generate realistic trajectories using
real-world data.
( 2
min )
To enhance the gaming experience, studios and developers spend tremendous effort creating photorealistic, immersive in-game environments. But non-playable characters (NPCs) often get left behind. Many behave in ways that lack depth and realism, making their interactions repetitive and forgettable. Inworld AI is changing the game by using generative AI to drive NPC behaviors that are Read article >
( 6
min )
This is a guest post co-authored by Nafi Ahmet Turgut, Hasan Burak Yel, and Damla Şentürk from Getir. Established in 2015, Getir has positioned itself as the trailblazer in the sphere of ultrafast grocery delivery. This innovative tech company has revolutionized the last-mile delivery segment with its compelling offering of “groceries in minutes.” With a […]
( 7
min )
The recent upheavals at OpenAI and OpenAI’s Chief Scientist’s apprehensions regarding the “safety” of AI have ignited a fresh wave of concerns and fears about the march towards Artificial General Intelligence (AGI) and “Super Intelligence.” AI safety concerns the development of AI systems aligned with human values and do not cause harm to humans. Some… Read More »A Different AI Scenario: AI and Justice in a Brave New World – Part 1
The post A Different AI Scenario: AI and Justice in a Brave New World – Part 1 appeared first on Data Science Central.
( 22
min )
Climate hazards can cause major disasters when they occur simultaneously as
compound hazards. To understand the distribution of climate risk and inform
adaptation policies, scientists need to simulate a large number of physically
realistic and spatially coherent events. Current methods are limited by
computational constraints and the probabilistic spatial distribution of
compound events is not given sufficient attention. The bottleneck in current
approaches lies in modelling the dependence structure between variables, as
inference on parametric models suffers from the curse of dimensionality.
Generative adversarial networks (GANs) are well-suited to such a problem due to
their ability to implicitly learn the distribution of data in high-dimensional
settings. We employ a GAN to model the dependence structure for daily maximum
wind speed, significant wave height, and total precipitation over the Bay of
Bengal, combining this with traditional extreme value theory for controlled
extrapolation of the tails. Once trained, the model can be used to efficiently
generate thousands of realistic compound hazard events, which can inform
climate risk assessments for climate adaptation and disaster preparedness. The
method developed is flexible and transferable to other multivariate and spatial
climate datasets.
( 2
min )
Inference of community structure in probabilistic graphical models may not be
consistent with fairness constraints when nodes have demographic attributes.
Certain demographics may be over-represented in some detected communities and
under-represented in others. This paper defines a novel $\ell_1$-regularized
pseudo-likelihood approach for fair graphical model selection. In particular,
we assume there is some community or clustering structure in the true
underlying graph, and we seek to learn a sparse undirected graph and its
communities from the data such that demographic groups are fairly represented
within the communities. In the case when the graph is known a priori, we
provide a convex semidefinite programming approach for fair community
detection. We establish the statistical consistency of the proposed method for
both a Gaussian graphical model and an Ising model for, respectively,
continuous and binary data, proving that our method can recover the graphs and
their fair communities with high probability.
( 2
min )
Analyzing large-scale time-series network data, such as social media and
email communications, poses a significant challenge in understanding social
dynamics, detecting anomalies, and predicting trends. In particular, the
scalability of graph analysis is a critical hurdle impeding progress in
large-scale downstream inference. To address this challenge, we introduce a
temporal encoder embedding method. This approach leverages ground-truth or
estimated vertex labels, enabling an efficient embedding of large-scale graph
data and the processing of billions of edges within minutes. Furthermore, this
embedding unveils a temporal dynamic statistic capable of detecting
communication pattern shifts across all levels, ranging from individual
vertices to vertex communities and the overall graph structure. We provide
theoretical support to confirm its soundness under random graph models, and
demonstrate its numerical advantages in capturing evolving communities and
identifying outliers. Finally, we showcase the practical application of our
approach by analyzing an anonymized time-series communication network from a
large organization spanning 2019-2020, enabling us to assess the impact of
Covid-19 on workplace communication patterns.
( 3
min )
This paper studies the one-shot behavior of no-regret algorithms for
stochastic bandits. Although many algorithms are known to be asymptotically
optimal with respect to the expected regret, over a single run, their
pseudo-regret seems to follow one of two tendencies: it is either smooth or
bumpy. To measure this tendency, we introduce a new notion: the sliding regret,
that measures the worst pseudo-regret over a time-window of fixed length
sliding to infinity. We show that randomized methods (e.g. Thompson Sampling
and MED) have optimal sliding regret, while index policies, although possibly
asymptotically optimal for the expected regret, have the worst possible sliding
regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB,
MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret
of index policies via the regret of exploration, that we show to be suboptimal
as well.
( 2
min )
Lipschitz continuity is a crucial functional property of any predictive
model, that naturally governs its robustness, generalisation, as well as
adversarial vulnerability. Contrary to other works that focus on obtaining
tighter bounds and developing different practical strategies to enforce certain
Lipschitz properties, we aim to thoroughly examine and characterise the
Lipschitz behaviour of Neural Networks. Thus, we carry out an empirical
investigation in a range of different settings (namely, architectures,
datasets, label noise, and more) by exhausting the limits of the simplest and
the most general lower and upper bounds. As a highlight of this investigation,
we showcase a remarkable fidelity of the lower Lipschitz bound, identify a
striking Double Descent trend in both upper and lower bounds to the Lipschitz
and explain the intriguing effects of label noise on function smoothness and
generalisation.
( 2
min )
The Fermat distance has been recently established as a useful tool for
machine learning tasks when a natural distance is not directly available to the
practitioner or to improve the results given by Euclidean distances by
exploding the geometrical and statistical properties of the dataset. This
distance depends on a parameter $\alpha$ that greatly impacts the performance
of subsequent tasks. Ideally, the value of $\alpha$ should be large enough to
navigate the geometric intricacies inherent to the problem. At the same, it
should remain restrained enough to sidestep any deleterious ramifications
stemming from noise during the process of distance estimation. We study both
theoretically and through simulations how to select this parameter.
( 2
min )
Virtually all machine learning tasks are characterized using some form of
loss function, and "good performance" is typically stated in terms of a
sufficiently small average loss, taken over the random draw of test data. While
optimizing for performance on average is intuitive, convenient to analyze in
theory, and easy to implement in practice, such a choice brings about
trade-offs. In this work, we survey and introduce a wide variety of
non-traditional criteria used to design and evaluate machine learning
algorithms, place the classical paradigm within the proper historical context,
and propose a view of learning problems which emphasizes the question of "what
makes for a desirable loss distribution?" in place of tacit use of the expected
loss.
( 2
min )
This paper presents a comprehensive comparative analysis of the performance
of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks
(QNN), juxtaposed against their classical counterparts: Equivariant Neural
Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of
each network with two toy examples for a binary classification task, focusing
on model complexity (measured by the number of parameters) and the size of the
training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$
EQNN and the QNN provide superior performance for smaller parameter sets and
modest training data samples.
( 2
min )
The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. From data collection and cleaning to feature engineering, model building, tuning, and deployment, ML projects often take months for developers to complete. And experienced data […]
( 10
min )
Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. As we continue to innovate to increase data science productivity, we’re excited to announce the improved SageMaker Studio experience, which allows users to select the managed Integrated Development Environment (IDE) […]
( 6
min )
As organizations scale the adoption of machine learning (ML), they are looking for efficient and reliable ways to deploy new infrastructure and onboard teams to ML environments. One of the challenges is setting up authentication and fine-grained permissions for users based on their roles and activities. For example, MLOps engineers typically perform model deployment activities, […]
( 8
min )
PwR uses domain-specific languages to bridge communication between developers and AI tools. Learn how it can help simplify code creation and enhance software reliability and customization, no matter your coding expertise.
The post PwR: Using representations for AI-powered software development appeared first on Microsoft Research.
( 10
min )
Concept erasure in text-to-image diffusion models aims to disable pre-trained
diffusion models from generating images related to a target concept. To perform
reliable concept erasure, the properties of robustness and locality are
desirable. The former refrains the model from producing images associated with
the target concept for any paraphrased or learned prompts, while the latter
preserves the model ability in generating images for non-target concepts. In
this paper, we propose Reliable Concept Erasing via Lightweight Erasers
(Receler), which learns a lightweight Eraser to perform concept erasing and
enhances locality and robustness with the proposed concept-localized
regularization and adversarial prompt learning, respectively. Comprehensive
quantitative and qualitative experiments with various concept prompts verify
the superiority of Receler over the previous erasing methods on the above two
desirable properties.
( 2
min )
Multivariate time series have many applications, from healthcare and
meteorology to life science. Although deep learning models have shown excellent
predictive performance for time series, they have been criticised for being
"black-boxes" or non-interpretable. This paper proposes a novel modular neural
network model for multivariate time series prediction that is interpretable by
construction. A recurrent neural network learns the temporal dependencies in
the data while an attention-based feature selection component selects the most
relevant features and suppresses redundant features used in the learning of the
temporal dependencies. A modular deep network is trained from the selected
features independently to show the users how features influence outcomes,
making the model interpretable. Experimental results show that this approach
can outperform state-of-the-art interpretable Neural Additive Models (NAM) and
variations thereof in both regression and classification of time series tasks,
achieving a predictive performance that is comparable to the top
non-interpretable methods for time series, LSTM and XGBoost.
( 2
min )
Understanding whether a property is priced fairly hinders buyers and sellers
since they usually do not have an objective viewpoint of the price distribution
for the overall market of their interest. Drawing from data collected of all
possible available properties for rent in Manhattan as of September 2023, this
paper aims to strengthen our understanding of model residuals; specifically on
machine learning models which generalize for a majority of the distribution of
a well-proportioned dataset. Most models generally perceive deviations from
predicted values as mere inaccuracies, however this paper proposes a different
vantage point: when generalizing to at least 75\% of the data-set, the
remaining deviations reveal significant insights. To harness these insights, we
introduce the Price Anomaly Score (PAS), a metric capable of capturing
boundaries between irregularly predicted prices. By combining relative pricing
discrepancies with statistical significance, the Price Anomaly Score (PAS)
offers a multifaceted view of rental valuations. This metric allows experts to
identify overpriced or underpriced properties within a dataset by aggregating
PAS values, then fine-tuning upper and lower boundaries to any threshold to set
indicators of choice.
( 3
min )
Traditional multi-view stereo (MVS) methods rely heavily on photometric and
geometric consistency constraints, but newer machine learning-based MVS methods
check geometric consistency across multiple source views only as a
post-processing step. In this paper, we present a novel approach that
explicitly encourages geometric consistency of reference view depth maps across
multiple source views at different scales during learning (see Fig. 1). We find
that adding this geometric consistency loss significantly accelerates learning
by explicitly penalizing geometrically inconsistent pixels, reducing the
training iteration requirements to nearly half that of other MVS methods. Our
extensive experiments show that our approach achieves a new state-of-the-art on
the DTU and BlendedMVS datasets, and competitive results on the Tanks and
Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt
to enforce multi-view, multi-scale geometric consistency during learning.
( 2
min )
Text-To-Image (TTI) models, such as DALL-E and StableDiffusion, have
demonstrated remarkable prompt-based image generation capabilities.
Multilingual encoders may have a substantial impact on the cultural agency of
these models, as language is a conduit of culture. In this study, we explore
the cultural perception embedded in TTI models by characterizing culture across
three hierarchical tiers: cultural dimensions, cultural domains, and cultural
concepts. Based on this ontology, we derive prompt templates to unlock the
cultural knowledge in TTI models, and propose a comprehensive suite of
evaluation techniques, including intrinsic evaluations using the CLIP space,
extrinsic evaluations with a Visual-Question-Answer (VQA) model and human
assessments, to evaluate the cultural content of TTI-generated images. To
bolster our research, we introduce the CulText2I dataset, derived from four
diverse TTI models and spanning ten languages. Our experiments provide insights
regarding Do, What, Which and How research questions about the nature of
cultural encoding in TTI models, paving the way for cross-cultural applications
of these models.
( 2
min )
Hyperparameter Optimization (HPO) of Deep Learning-based models tends to be a
compute resource intensive process as it usually requires to train the target
model with many different hyperparameter configurations. We show that
integrating model performance prediction with early stopping methods holds
great potential to speed up the HPO process of deep learning models. Moreover,
we propose a novel algorithm called Swift-Hyperband that can use either
classical or quantum support vector regression for performance prediction and
benefit from distributed High Performance Computing environments. This
algorithm is tested not only for the Machine-Learned Particle Flow model used
in High Energy Physics, but also for a wider range of target models from
domains such as computer vision and natural language processing.
Swift-Hyperband is shown to find comparable (or better) hyperparameters as well
as using less computational resources in all test cases.
( 2
min )
Tensor network (TN) representation is a powerful technique for computer
vision and machine learning. TN structure search (TN-SS) aims to search for a
customized structure to achieve a compact representation, which is a
challenging NP-hard problem. Recent "sampling-evaluation-based" methods require
sampling an extensive collection of structures and evaluating them one by one,
resulting in prohibitively high computational costs. To address this issue, we
propose a novel TN paradigm, named SVD-inspired TN decomposition (SVDinsTN),
which allows us to efficiently solve the TN-SS problem from a regularized
modeling perspective, eliminating the repeated structure evaluations. To be
specific, by inserting a diagonal factor for each edge of the fully-connected
TN, SVDinsTN allows us to calculate TN cores and diagonal factors
simultaneously, with the factor sparsity revealing a compact TN structure. In
theory, we prove a convergence guarantee for the proposed method. Experimental
results demonstrate that the proposed method achieves approximately 100 to 1000
times acceleration compared to the state-of-the-art TN-SS methods while
maintaining a comparable representation ability.
( 2
min )
The unstructured nature of data used in foundation model development is a
challenge to systematic analyses for making data use and documentation
decisions. From a Responsible AI perspective, these decisions often rely upon
understanding how people are represented in data. We propose a framework
designed to guide analysis of human representation in unstructured data and
identify downstream risks. We apply the framework in two toy examples using the
Common Crawl web text corpus (C4) and LAION-400M. We also propose a set of
hypothetical action steps in service of dataset use, development, and
documentation.
( 2
min )
Crop management decision support systems are specialized tools for farmers
that reduce the riskiness of revenue streams, especially valuable for use under
the current climate changes that impact agricultural productivity.
Unfortunately, small farmers in India, who could greatly benefit from these
tools, do not have access to them. In this paper, we model an individual
greenhouse as a Markov Decision Process (MDP) and adapt Li and Li (2019)'s
Follow the Weighted Leader (FWL) online learning algorithm to offer crop
planning advice. We successfully produce utility-preserving cropping pattern
suggestions in simulations. When we compare against an offline planning
algorithm, we achieve the same cumulative revenue with greatly reduced runtime.
( 2
min )
Generative models can produce impressively realistic images. This paper
demonstrates that generated images have geometric features different from those
of real images. We build a set of collections of generated images, prequalified
to fool simple, signal-based classifiers into believing they are real. We then
show that prequalified generated images can be identified reliably by
classifiers that only look at geometric properties. We use three such
classifiers. All three classifiers are denied access to image pixels, and look
only at derived geometric features. The first classifier looks at the
perspective field of the image, the second looks at lines detected in the
image, and the third looks at relations between detected objects and shadows.
Our procedure detects generated images more reliably than SOTA local signal
based detectors, for images from a number of distinct generators. Saliency maps
suggest that the classifiers can identify geometric problems reliably. We
conclude that current generators cannot reliably reproduce geometric properties
of real images.
( 2
min )
Model-agnostic anomaly detection is one of the promising approaches in the
search for new beyond the standard model physics. In this paper, we present
Set-VAE, a particle-based variational autoencoder (VAE) anomaly detection
algorithm. We demonstrate a 2x signal efficiency gain compared with traditional
subjettiness-based jet selection. Furthermore, with an eye to the future
deployment to trigger systems, we propose the CLIP-VAE, which reduces the
inference-time cost of anomaly detection by using the KL-divergence loss as the
anomaly score, resulting in a 2x acceleration in latency and reducing the
caching requirement.
( 2
min )
Evaluating the accuracy of outputs generated by Large Language Models (LLMs)
is especially important in the climate science and policy domain. We introduce
the Expert Confidence in Climate Statements (ClimateX) dataset, a novel,
curated, expert-labeled dataset consisting of 8094 climate statements collected
from the latest Intergovernmental Panel on Climate Change (IPCC) reports,
labeled with their associated confidence levels. Using this dataset, we show
that recent LLMs can classify human expert confidence in climate-related
statements, especially in a few-shot learning setting, but with limited (up to
47%) accuracy. Overall, models exhibit consistent and significant
over-confidence on low and medium confidence statements. We highlight
implications of our results for climate communication, LLMs evaluation
strategies, and the use of LLMs in information retrieval systems.
( 2
min )
Although much work has been done on explainability in the computer vision and
natural language processing (NLP) fields, there is still much work to be done
to explain methods applied to time series as time series by nature can not be
understood at first sight. In this paper, we present a Deep Neural Network
(DNN) in a teacher-student architecture (distillation model) that offers
interpretability in time-series classification tasks. The explainability of our
approach is based on transforming the time series to 2D plots and applying
image highlight methods (such as LIME and GradCam), making the predictions
interpretable. At the same time, the proposed approach offers increased
accuracy competing with the baseline model with the trade-off of increasing the
training time.
( 2
min )
Astronomical transients, such as supernovae and other rare stellar
explosions, have been instrumental in some of the most significant discoveries
in astronomy. New astronomical sky surveys will soon record unprecedented
numbers of transients as sparsely and irregularly sampled multivariate time
series. To improve our understanding of the physical mechanisms of transients
and their progenitor systems, early-time measurements are necessary.
Prioritizing the follow-up of transients based on their age along with their
class is crucial for new surveys. To meet this demand, we present the first
method of predicting the age of transients in real-time from multi-wavelength
time-series observations. We build a Bayesian probabilistic recurrent neural
network. Our method can accurately predict the age of a transient with robust
uncertainties as soon as it is initially triggered by a survey telescope. This
work will be essential for the advancement of our understanding of the numerous
young transients being detected by ongoing and upcoming astronomical surveys.
( 2
min )
We introduce a diffusion-based generative model to describe the distribution
of galaxies in our Universe directly as a collection of points in 3-D space
(coordinates) optionally with associated attributes (e.g., velocities and
masses), without resorting to binning or voxelization. The custom diffusion
model can be used both for emulation, reproducing essential summary statistics
of the galaxy distribution, as well as inference, by computing the conditional
likelihood of a galaxy field. We demonstrate a first application to massive
dark matter haloes in the Quijote simulation suite. This approach can be
extended to enable a comprehensive analysis of cosmological data, circumventing
limitations inherent to summary statistic -- as well as neural simulation-based
inference methods.
( 2
min )
The aim of this short note is to show that Denoising Diffusion Probabilistic
Model DDPM, a non-homogeneous discrete-time Markov process, can be represented
by a time-homogeneous continuous-time Markov process observed at non-uniformly
sampled discrete times. Surprisingly, this continuous-time Markov process is
the well-known and well-studied Ornstein-Ohlenbeck (OU) process, which was
developed in 1930's for studying Brownian particles in Harmonic potentials. We
establish the formal equivalence between DDPM and the OU process using its
analytical solution. We further demonstrate that the design problem of the
noise scheduler for non-homogeneous DDPM is equivalent to designing observation
times for the OU process. We present several heuristic designs for observation
times based on principled quantities such as auto-variance and Fisher
Information and connect them to ad hoc noise schedules for DDPM. Interestingly,
we show that the Fisher-Information-motivated schedule corresponds exactly the
cosine schedule, which was developed without any theoretical foundation but is
the current state-of-the-art noise schedule.
( 2
min )
Diffusion models excel at generating photo-realistic images but come with
significant computational costs in both training and sampling. While various
techniques address these computational challenges, a less-explored issue is
designing an efficient and adaptable network backbone for iterative refinement.
Current options like U-Net and Vision Transformer often rely on
resource-intensive deep networks and lack the flexibility needed for generating
images at variable resolutions or with a smaller network than used in training.
This study introduces LEGO bricks, which seamlessly integrate Local-feature
Enrichment and Global-content Orchestration. These bricks can be stacked to
create a test-time reconfigurable diffusion backbone, allowing selective
skipping of bricks to reduce sampling costs and generate higher-resolution
images than the training data. LEGO bricks enrich local regions with an MLP and
transform them using a Transformer block while maintaining a consistent
full-resolution image across all bricks. Experimental results demonstrate that
LEGO bricks enhance training efficiency, expedite convergence, and facilitate
variable-resolution image generation while maintaining strong generative
performance. Moreover, LEGO significantly reduces sampling time compared to
other methods, establishing it as a valuable enhancement for diffusion models.
( 2
min )
Causal inference studies whether the presence of a variable influences an
observed outcome. As measured by quantities such as the "average treatment
effect," this paradigm is employed across numerous biological fields, from
vaccine and drug development to policy interventions. Unfortunately, the
majority of these methods are often limited to univariate outcomes. Our work
generalizes causal estimands to outcomes with any number of dimensions or any
measurable space, and formulates traditional causal estimands for nominal
variables as causal discrepancy tests. We propose a simple technique for
adjusting universally consistent conditional independence tests and prove that
these tests are universally consistent causal discrepancy tests. Numerical
experiments illustrate that our method, Causal CDcorr, leads to improvements in
both finite sample validity and power when compared to existing strategies. Our
methods are all open source and available at github.com/ebridge2/cdcorr.
( 2
min )
Astronomical transients, such as supernovae and other rare stellar
explosions, have been instrumental in some of the most significant discoveries
in astronomy. New astronomical sky surveys will soon record unprecedented
numbers of transients as sparsely and irregularly sampled multivariate time
series. To improve our understanding of the physical mechanisms of transients
and their progenitor systems, early-time measurements are necessary.
Prioritizing the follow-up of transients based on their age along with their
class is crucial for new surveys. To meet this demand, we present the first
method of predicting the age of transients in real-time from multi-wavelength
time-series observations. We build a Bayesian probabilistic recurrent neural
network. Our method can accurately predict the age of a transient with robust
uncertainties as soon as it is initially triggered by a survey telescope. This
work will be essential for the advancement of our understanding of the numerous
young transients being detected by ongoing and upcoming astronomical surveys.
( 2
min )
There are a number of available methods for selecting whom to prioritize for
treatment, including ones based on treatment effect estimation, risk scoring,
and hand-crafted rules. We propose rank-weighted average treatment effect
(RATE) metrics as a simple and general family of metrics for comparing and
testing the quality of treatment prioritization rules. RATE metrics are
agnostic as to how the prioritization rules were derived, and only assess how
well they identify individuals that benefit the most from treatment. We define
a family of RATE estimators and prove a central limit theorem that enables
asymptotically exact inference in a wide variety of randomized and
observational study settings. RATE metrics subsume a number of existing
metrics, including the Qini coefficient, and our analysis directly yields
inference methods for these metrics. We showcase RATE in the context of a
number of applications, including optimal targeting of aspirin to stroke
patients.
( 2
min )
We introduce a diffusion-based generative model to describe the distribution
of galaxies in our Universe directly as a collection of points in 3-D space
(coordinates) optionally with associated attributes (e.g., velocities and
masses), without resorting to binning or voxelization. The custom diffusion
model can be used both for emulation, reproducing essential summary statistics
of the galaxy distribution, as well as inference, by computing the conditional
likelihood of a galaxy field. We demonstrate a first application to massive
dark matter haloes in the Quijote simulation suite. This approach can be
extended to enable a comprehensive analysis of cosmological data, circumventing
limitations inherent to summary statistic -- as well as neural simulation-based
inference methods.
( 2
min )
Synthetic data (SD) have garnered attention as a privacy enhancing
technology. Unfortunately, there is no standard for quantifying their degree of
privacy protection. In this paper, we discuss proposed quantification
approaches. This contributes to the development of SD privacy standards;
stimulates multi-disciplinary discussion; and helps SD researchers make
informed modeling and evaluation decisions.
( 2
min )
We believe generative AI has the potential over time to transform virtually every customer experience we know. The number of companies launching generative AI applications on AWS is substantial and building quickly, including adidas, Booking.com, Bridgewater Associates, Clariant, Cox Automotive, GoDaddy, and LexisNexis Legal & Professional, to name just a few. Innovative startups like Perplexity […]
( 26
min )
Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at scale. SageMaker makes it easy to deploy models into production directly through API calls to the service. Models are packaged into containers for robust and scalable deployments. SageMaker provides […]
( 12
min )
Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and effortlessly build, train, and deploy machine learning (ML) models at any scale. SageMaker makes it straightforward to deploy models into production directly through API calls to the service. Models are packaged into containers for robust and scalable deployments. Although […]
( 15
min )
Today, we are excited to announce support for Code Editor, a new integrated development environment (IDE) option in Amazon SageMaker Studio. Code Editor is based on Code-OSS, Visual Studio Code Open Source, and provides access to the familiar environment and tools of the popular IDE that machine learning (ML) developers know and love, fully integrated […]
( 9
min )
As democratization of foundation models (FMs) becomes more prevalent and demand for AI-augmented services increases, software as a service (SaaS) providers are looking to use machine learning (ML) platforms that support multiple tenants—for data scientists internal to their organization and external customers. More and more companies are realizing the value of using FMs to generate […]
( 17
min )
As organizations deploy models to production, they are constantly looking for ways to optimize the performance of their foundation models (FMs) running on the latest accelerators, such as AWS Inferentia and GPUs, so they can reduce their costs and decrease response latency to provide the best experience to end-users. However, some FMs don’t fully utilize […]
( 13
min )
Amazon SageMaker makes it straightforward to deploy machine learning (ML) models for real-time inference and offers a broad selection of ML instances spanning CPUs and accelerators such as AWS Inferentia. As a fully managed service, you can scale your model deployments, minimize inference costs, and manage your models more effectively in production with reduced operational […]
( 6
min )
Amazon SageMaker Canvas is a no-code workspace that enables analysts and citizen data scientists to generate accurate machine learning (ML) predictions for their business needs. Starting today, SageMaker Canvas supports advanced model build configurations such as selecting a training method (ensemble or hyperparameter optimization) and algorithms, customizing the training and validation data split ratio, and […]
( 12
min )
Building foundation models (FMs) requires building, maintaining, and optimizing large clusters to train models with tens to hundreds of billions of parameters on vast amounts of data. Creating a resilient environment that can handle failures and environmental changes without losing days or weeks of model training progress is an operational challenge that requires you to […]
( 10
min )
Digital publishers are continuously looking for ways to streamline and automate their media workflows to generate and publish new content as rapidly as they can, but without foregoing quality. Adding images to capture the essence of text can improve the reading experience. Machine learning techniques can help you discover such images. “A striking image is […]
( 10
min )
The risks associated with generative AI have been well-publicized. Toxicity, bias, escaped PII, and hallucinations negatively impact an organization’s reputation and damage customer trust. Research shows that not only do risks for bias and toxicity transfer from pre-trained foundation models (FM) to task-specific generative AI services, but that tuning an FM for specific tasks, on […]
( 13
min )
Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. With this integration, SageMaker Canvas provides customers with an end-to-end no-code workspace to prepare data, build and use ML and […]
( 7
min )
In the last few years Large Language Models (LLMs) have risen to prominence as outstanding tools capable of understanding, generating and manipulating text with unprecedented proficiency. Their potential applications span from conversational agents to content generation and information retrieval, holding the promise of revolutionizing all industries. However, harnessing this potential while ensuring the responsible and […]
( 15
min )
In today’s rapidly evolving landscape of artificial intelligence, deep learning models have found themselves at the forefront of innovation, with applications spanning computer vision (CV), natural language processing (NLP), and recommendation systems. However, the increasing cost associated with training and fine-tuning these models poses a challenge for enterprises. This cost is primarily driven by the […]
( 8
min )
In November 2023, MarketsandMarkets announced the publication of its Knowledge Graph Market report. In its announcement, M&M estimated the 2023 global knowledge graph market at $0.9 billion, forecasting market growth to $2.4 billion by 2028, a compound annual growth rate of 21.9 percent. M&M also listed these 12 “key players” in its announcement: I haven’t… Read More »A few large enterprise software provider strategies for the knowledge graph market
The post A few large enterprise software provider strategies for the knowledge graph market appeared first on Data Science Central.
( 21
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Predicting the infiltration of Glioblastoma (GBM) from medical MRI scans is
crucial for understanding tumor growth dynamics and designing personalized
radiotherapy treatment plans.Mathematical models of GBM growth can complement
the data in the prediction of spatial distributions of tumor cells. However,
this requires estimating patient-specific parameters of the model from clinical
data, which is a challenging inverse problem due to limited temporal data and
the limited time between imaging and diagnosis. This work proposes a method
that uses Physics-Informed Neural Networks (PINNs) to estimate patient-specific
parameters of a reaction-diffusion PDE model of GBM growth from a single 3D
structural MRI snapshot. PINNs embed both the data and the PDE into a loss
function, thus integrating theory and data. Key innovations include the
identification and estimation of characteristic non-dimensional parameters, a
pre-training step that utilizes the non-dimensional parameters and a
fine-tuning step to determine the patient specific parameters. Additionally,
the diffuse domain method is employed to handle the complex brain geometry
within the PINN framework. Our method is validated both on synthetic and
patient datasets, and shows promise for real-time parametric inference in the
clinical setting for personalized GBM treatment.
( 2
min )
Electroanatomical mapping is a technique used in cardiology to create a
detailed 3D map of the electrical activity in the heart. It is useful for
diagnosis, treatment planning and real time guidance in cardiac ablation
procedures to treat arrhythmias like atrial fibrillation. A probabilistic
machine learning model trained on a library of CT/MRI scans of the heart can be
used during electroanatomical mapping to generate a patient-specific 3D model
of the chamber being mapped. The use of probabilistic machine learning models
under a Bayesian framework provides a way to quantify uncertainty in results
and provide a natural framework of interpretability of the model. Here we
introduce a Bayesian approach to surface reconstruction of cardiac chamber
models from a sparse 3D point cloud data acquired during electroanatomical
mapping. We show how probabilistic graphical models trained on segmented CT/MRI
data can be used to generate cardiac chamber models from few acquired locations
thereby reducing procedure time and x-ray exposure. We show how they provide
insight into what the neural network learns from the segmented CT/MRI images
used to train the network, which provides explainability to the resulting
cardiac chamber models generated by the model.
( 2
min )
We study the sample complexity of identifying the pure strategy Nash
equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally,
we are given a stochastic model where any learner can sample an entry $(i,j)$
of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where
$\eta$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to
identify the PSNE of $A$, whenever it exists, with high probability while
taking as few samples as possible. Zhou et al. (2017) presents an
instance-dependent sample complexity lower bound that depends only on the
entries in the row and column in which the PSNE lies. We design a near-optimal
algorithm whose sample complexity matches the lower bound, up to log factors.
The problem of identifying the PSNE also generalizes the problem of pure
exploration in stochastic multi-armed bandits and dueling bandits, and our
result matches the optimal bounds, up to log factors, in both the settings.
( 2
min )
Large language models (LLMs) aligned to human preferences via reinforcement
learning from human feedback (RLHF) underpin many commercial applications of
LLM technology. Despite this, the impacts of RLHF on LLM internals remain
opaque. We propose a novel method for interpreting implicit reward models
(IRMs) in LLMs learned through RLHF. Our approach trains pairs of autoencoders
on activations from a base LLM and its RLHF-tuned variant. Through a comparison
of autoencoder hidden spaces, we identify features that reflect the accuracy of
the learned IRM. To illustrate our method, we fine-tune an LLM via RLHF to
learn a token-utility mapping and maximize the aggregate utility of generated
text. This is the first application of sparse autoencoders to interpreting
IRMs. Our method provides an abstract approximation of reward integrity and
holds promise for measuring alignment between specified objectives and learned
model behaviors.
( 2
min )
Many problems in machine learning can be formulated as solving
entropy-regularized optimal transport on the space of probability measures. The
canonical approach involves the Sinkhorn iterates, renowned for their rich
mathematical properties. Recently, the Sinkhorn algorithm has been recast
within the mirror descent framework, thus benefiting from classical
optimization theory insights. Here, we build upon this result by introducing a
continuous-time analogue of the Sinkhorn algorithm. This perspective allows us
to derive novel variants of Sinkhorn schemes that are robust to noise and bias.
Moreover, our continuous-time dynamics not only generalize but also offer a
unified perspective on several recently discovered dynamics in machine learning
and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or
the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).
( 2
min )
In climate simulations, small-scale processes shape ocean dynamics but remain
computationally expensive to resolve directly. For this reason, their
contributions are commonly approximated using empirical parameterizations,
which lead to significant errors in long-term projections. In this work, we
develop parameterizations based on Fourier Neural Operators, showcasing their
accuracy and generalizability in comparison to other approaches. Finally, we
discuss the potential and limitations of neural networks operating in the
frequency domain, paving the way for future investigation.
( 2
min )
Missing data is a common problem in practical settings. Various imputation
methods have been developed to deal with missing data. However, even though the
label is usually available in the training data, the common practice of
imputation usually only relies on the input and ignores the label. In this
work, we illustrate how stacking the label into the input can significantly
improve the imputation of the input. In addition, we propose a classification
strategy that initializes the predicted test label with missing values and
stacks the label with the input for imputation. This allows imputing the label
and the input at the same time. Also, the technique is capable of handling data
training with missing labels without any prior imputation and is applicable to
continuous, categorical, or mixed-type data. Experiments show promising results
in terms of accuracy.
( 2
min )
Many problems in machine learning can be formulated as solving
entropy-regularized optimal transport on the space of probability measures. The
canonical approach involves the Sinkhorn iterates, renowned for their rich
mathematical properties. Recently, the Sinkhorn algorithm has been recast
within the mirror descent framework, thus benefiting from classical
optimization theory insights. Here, we build upon this result by introducing a
continuous-time analogue of the Sinkhorn algorithm. This perspective allows us
to derive novel variants of Sinkhorn schemes that are robust to noise and bias.
Moreover, our continuous-time dynamics not only generalize but also offer a
unified perspective on several recently discovered dynamics in machine learning
and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or
the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).
( 2
min )
Rodney Brooks, co-founder of iRobot, kicks off an MIT symposium on the promise and potential pitfalls of increasingly powerful AI tools like ChatGPT.
( 12
min )
Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio. With this launch, you can programmatically run notebooks as jobs […]
( 11
min )
The rapid growth of generative AI brings promising new innovation, and at the same time raises new challenges. These challenges include some that were common before generative AI, such as bias and explainability, and new ones unique to foundation models (FMs), including hallucination and toxicity. At AWS, we are committed to developing generative AI responsibly, […]
( 9
min )
Since launching in June 2023, the AWS Generative AI Innovation Center team of strategists, data scientists, machine learning (ML) engineers, and solutions architects have worked with hundreds of customers worldwide, and helped them ideate, prioritize, and build bespoke solutions that harness the power of generative AI. Customers worked closely with us to prioritize use cases, […]
( 4
min )
Mira Murati as CTO, Greg Brockman returns as President. Read messages from CEO Sam Altman and board chair Bret Taylor.
( 5
min )
The magnitude of a metric space was recently established as a novel
invariant, providing a measure of the `effective size' of a space across
multiple scales. By capturing both geometrical and topological properties of
data, magnitude is poised to address challenges in unsupervised representation
learning tasks. We formalise a novel notion of dissimilarity between magnitude
functions of finite metric spaces and use them to derive a quality measure for
dimensionality reduction tasks. Our measure is provably stable under
perturbations of the data, can be efficiently calculated, and enables a
rigorous multi-scale comparison of embeddings. We show the utility of our
measure in an experimental suite that comprises different domains and tasks,
including the comparison of data visualisations.
( 2
min )
Motivated by applications in text mining and discrete distribution inference,
we investigate the testing for equality of probability mass functions of $K$
groups of high-dimensional multinomial distributions. A test statistic, which
is shown to have an asymptotic standard normal distribution under the null, is
proposed. The optimal detection boundary is established, and the proposed test
is shown to achieve this optimal detection boundary across the entire parameter
space of interest. The proposed method is demonstrated in simulation studies
and applied to analyze two real-world datasets to examine variation among
consumer reviews of Amazon movies and diversity of statistical paper abstracts.
( 2
min )
In the multi-armed bandit framework, there are two formulations that are
commonly employed to handle time-varying reward distributions: adversarial
bandit and nonstationary bandit. Although their oracles, algorithms, and regret
analysis differ significantly, we provide a unified formulation in this paper
that smoothly bridges the two as special cases. The formulation uses an oracle
that takes the best-fixed arm within time windows. Depending on the window
size, it turns into the oracle in hindsight in the adversarial bandit and
dynamic oracle in the nonstationary bandit. We provide algorithms that attain
the optimal regret with the matching lower bound.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).
( 2
min )
The exploration-exploitation dilemma has been a central challenge in
reinforcement learning (RL) with complex model classes. In this paper, we
propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound
(MQL-UCB) for RL with general function approximation. Our key algorithmic
design includes (1) a general deterministic policy-switching strategy that
achieves low switching cost, (2) a monotonic value function structure with
carefully controlled function class complexity, and (3) a variance-weighted
regression scheme that exploits historical trajectories with high data
efficiency. MQL-UCB achieves minimax optimal regret of $\tilde{O}(d\sqrt{HK})$
when $K$ is sufficiently large and near-optimal policy switching cost of
$\tilde{O}(dH)$, with $d$ being the eluder dimension of the function class, $H$
being the planning horizon, and $K$ being the number of episodes.
Our work sheds light on designing provably sample-efficient and
deployment-efficient Q-learning with nonlinear function approximation.
( 2
min )
Constrained optimization of the parameters of a simulator plays a crucial
role in a design process. These problems become challenging when the simulator
is stochastic, computationally expensive, and the parameter space is
high-dimensional. One can efficiently perform optimization only by utilizing
the gradient with respect to the parameters, but these gradients are
unavailable in many legacy, black-box codes. We introduce the algorithm
Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the
issues mentioned earlier by efficiently estimating the gradient, reducing the
noise of the gradient estimator, and applying multi-fidelity schemes to further
reduce computational effort. We validate our approach on standard benchmarks,
demonstrating its effectiveness in optimizing parameters highlighting better
performance compared to existing methods.
( 2
min )
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly
popular in financial applications, owing to certain desirable properties that
it enjoys. We consider the problem of estimating UBSR in a recursive setting,
where samples from the underlying loss distribution are available
one-at-a-time. We cast the UBSR estimation problem as a root finding problem,
and propose stochastic approximation-based estimations schemes. We derive
non-asymptotic bounds on the estimation error in the number of samples. We also
consider the problem of UBSR optimization within a parameterized class of
random variables. We propose a stochastic gradient descent based algorithm for
UBSR optimization, and derive non-asymptotic bounds on its convergence.
( 2
min )
In a high-dimensional regression framework, we study consequences of the
naive two-step procedure where first the dimension of the input variables is
reduced and second, the reduced input variables are used to predict the output
variable with kernel regression. In order to analyze the resulting regression
errors, a novel stability result for kernel regression with respect to the
Wasserstein distance is derived. This allows us to bound errors that occur when
perturbed input data is used to fit the regression function. We apply the
general stability result to principal component analysis (PCA). Exploiting
known estimates from the literature on both principal component analysis and
kernel regression, we deduce convergence rates for the two-step procedure. The
latter turns out to be particularly useful in a semi-supervised setting.
( 2
min )
Density power divergence (DPD) is designed to robustly estimate the
underlying distribution of observations, in the presence of outliers. However,
DPD involves an integral of the power of the parametric density models to be
estimated; the explicit form of the integral term can be derived only for
specific densities, such as normal and exponential densities. While we may
perform a numerical integration for each iteration of the optimization
algorithms, the computational complexity has hindered the practical application
of DPD-based estimation to more general parametric densities. To address the
issue, this study introduces a stochastic approach to minimize DPD for general
parametric density models. The proposed approach also can be employed to
minimize other density power-based $\gamma$-divergences, by leveraging
unnormalized models.
( 2
min )
We study the long time behavior of an underdamped mean-field Langevin (MFL)
equation, and provide a general convergence as well as an exponential
convergence rate result under different conditions. The results on the MFL
equation can be applied to study the convergence of the Hamiltonian gradient
descent algorithm for the overparametrized optimization. We then provide a
numerical example of the algorithm to train a generative adversarial networks
(GAN).
( 2
min )
We consider the gradient descent flow widely used for the minimization of the
$\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two
modified versions; one adapted for the overparametrized setting, and the other
for the underparametrized setting. Both have a clear and natural invariant
geometric meaning, taking into account the pullback vector bundle structure in
the overparametrized, and the pushforward vector bundle structure in the
underparametrized setting. In the overparametrized case, we prove that,
provided that a rank condition holds, all orbits of the modified gradient
descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform
exponential convergence rate. We point out relations of the latter to
sub-Riemannian geometry.
( 2
min )
The convergence of deterministic policy gradient under the Hadamard
parameterization is studied in the tabular setting and the linear convergence
of the algorithm is established. To this end, we first show that the error
decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this
result, we further show that the algorithm has a faster local linear
convergence rate after $k_0$ iterations, where $k_0$ is a constant that only
depends on the MDP problem and the initialization. To show the local linear
convergence of the algorithm, we have indeed established the contraction of the
sub-optimal probability $b_s^k$ (i.e., the probability of the output policy
$\pi^k$ on non-optimal actions) when $k\ge k_0$.
( 2
min )
Navigating dynamic physical environments without obstructing or damaging
human assets is of quintessential importance for social robots. In this work,
we solve autonomous drone navigation's sub-problem of predicting out-of-domain
human and agent trajectories using a deep generative model. Our method:
General-PECNet or G-PECNet observes an improvement of 9.5\% on the Final
Displacement Error (FDE) on 2020's benchmark: PECNet through a combination of
architectural improvements inspired by periodic activation functions and
synthetic trajectory (data) augmentations using Hidden Markov Models (HMMs) and
Reinforcement Learning (RL). Additionally, we propose a simple
geometry-inspired metric for trajectory non-linearity and outlier detection,
helpful for the task. Code available at
$\href{https://github.com/Aryan-Garg/PECNet-Pedestrian-Trajectory-Prediction.git}{GitHub}$
( 2
min )
We study the long time behavior of an underdamped mean-field Langevin (MFL)
equation, and provide a general convergence as well as an exponential
convergence rate result under different conditions. The results on the MFL
equation can be applied to study the convergence of the Hamiltonian gradient
descent algorithm for the overparametrized optimization. We then provide a
numerical example of the algorithm to train a generative adversarial networks
(GAN).
( 2
min )
Federated learning is a new learning paradigm that decouples data collection
and model training via multi-party computation and model aggregation. As a
flexible learning setting, federated learning has the potential to integrate
with other learning frameworks. We conduct a focused survey of federated
learning in conjunction with other learning algorithms. Specifically, we
explore various learning algorithms to improve the vanilla federated averaging
algorithm and review model fusion methods such as adaptive aggregation,
regularization, clustered methods, and Bayesian methods. Following the emerging
trends, we also discuss federated learning in the intersection with other
learning paradigms, termed federated X learning, where X includes multitask
learning, meta-learning, transfer learning, unsupervised learning, and
reinforcement learning. This survey reviews the state of the art, challenges,
and future directions.
( 2
min )
As the adoption of Artificial Intelligence (AI) systems within the clinical
environment grows, limitations in bandwidth and compute can create
communication bottlenecks when streaming imaging data, leading to delays in
patient care and increased cost. As such, healthcare providers and AI vendors
will require greater computational infrastructure, therefore dramatically
increasing costs. To that end, we developed ISLE, an intelligent streaming
framework for high-throughput, compute- and bandwidth- optimized, and cost
effective AI inference for clinical decision making at scale. In our
experiments, ISLE on average reduced data transmission by 98.02% and decoding
time by 98.09%, while increasing throughput by 2,730%. We show that ISLE
results in faster turnaround times, and reduced overall cost of data,
transmission, and compute, without negatively impacting clinical decision
making using AI systems.
( 2
min )
The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology
with a broad spectrum of acute and chronic findings. Precise diagnostic
criteria for a renal biopsy diagnosis of TMA are missing. As a first step
towards a machine learning- and computer vision-based analysis of wholes slide
images from renal biopsies, we trained a segmentation model for the decisive
diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of
whole slide images from renal biopsies with TMAs and Mimickers (distinct
diseases with a similar nephropathological appearance as TMA like severe benign
nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy,
arteriolar light chain deposition disease). Our segmentation model combines a
U-Net-based tissue detection with a Shifted windows-transformer architecture to
reach excellent segmentation results for even the most severely altered
glomeruli, arterioles and arteries, even on unseen staining domains from a
different nephropathology lab. With accurate automatic segmentation of the
decisive renal biopsy compartments in human renal vasculopathies, we have laid
the foundation for large-scale compartment-specific machine learning and
computer vision analysis of renal biopsy repositories with TMAs.
( 3
min )
Explainable Artificial Intelligence (XAI) is targeted at understanding how
models perform feature selection and derive their classification decisions.
This paper explores post-hoc explanations for deep neural networks in the audio
domain. Notably, we present a novel Open Source audio dataset consisting of
30,000 audio samples of English spoken digits which we use for classification
tasks on spoken digits and speakers' biological sex. We use the popular XAI
technique Layer-wise Relevance Propagation (LRP) to identify relevant features
for two neural network architectures that process either waveform or
spectrogram representations of the data. Based on the relevance scores obtained
from LRP, hypotheses about the neural networks' feature selection are derived
and subsequently tested through systematic manipulations of the input data.
Further, we take a step beyond visual explanations and introduce audible
heatmaps. We demonstrate the superior interpretability of audible explanations
over visual ones in a human user study.
( 2
min )
In the field of statistical physics, machine learning has gained significant
popularity and has achieved remarkable results in recent studies on phase
transitions.In this paper, we apply Principal Component Analysis (PCA) and
Autoencoder(AE) based on Unsupervised learning to study the various
configurations of the percolation model in equilibrium phase transition. In
certain phase transition models, such as the DP model in non-equilibrium phase
transitions, the order parameter is particle density. However, in some other
phase transition models, such as the percolation model, it is not. This study
involved randomizing and selecting percolation graphs to be used as input for a
neural network, and analyzed the obtained results, indicating that the outputs
of the single latent variable of AE and the first principal component of PCA
are signals related to particle density.
( 2
min )
We introduce a generalizable approach that combines perturbation method and
one-shot transfer learning to solve nonlinear ODEs with a single polynomial
term, using Physics-Informed Neural Networks (PINNs). Our method transforms
non-linear ODEs into linear ODE systems, trains a PINN across varied
conditions, and offers a closed-form solution for new instances within the same
non-linear ODE class. We demonstrate the effectiveness of this approach on the
Duffing equation and suggest its applicability to similarly structured PDEs and
ODE systems.
( 2
min )
In recent years, Large Language Models (LLM) have emerged as pivotal tools in
various applications. However, these models are susceptible to adversarial
prompt attacks, where attackers can carefully curate input strings that lead to
undesirable outputs. The inherent vulnerability of LLMs stems from their
input-output mechanisms, especially when presented with intensely
out-of-distribution (OOD) inputs. This paper proposes a token-level detection
method to identify adversarial prompts, leveraging the LLM's capability to
predict the next token's probability. We measure the degree of the model's
perplexity and incorporate neighboring token information to encourage the
detection of contiguous adversarial prompt sequences. As a result, we propose
two methods: one that identifies each token as either being part of an
adversarial prompt or not, and another that estimates the probability of each
token being part of an adversarial prompt.
( 2
min )
Zero-shot Dialogue State Tracking (DST) addresses the challenge of acquiring
and annotating task-oriented dialogues, which can be time-consuming and costly.
However, DST extends beyond simple slot-filling and requires effective updating
strategies for tracking dialogue state as conversations progress. In this
paper, we propose ParsingDST, a new In-Context Learning (ICL) method, to
introduce additional intricate updating strategies in zero-shot DST. Our
approach reformulates the DST task by leveraging powerful Large Language Models
(LLMs) and translating the original dialogue text to JSON through semantic
parsing as an intermediate state. We also design a novel framework that
includes more modules to ensure the effectiveness of updating strategies in the
text-to-JSON process. Experimental results demonstrate that our approach
outperforms existing zero-shot DST methods on MultiWOZ, exhibiting significant
improvements in Joint Goal Accuracy (JGA) and slot accuracy compared to
existing ICL methods. Our code has been released.
( 2
min )
To process sensor data in the Internet of Things(IoTs), embedded deep
learning for 1-dimensional data is an important technique. In the past, CNNs
were frequently used because they are simple to optimise for special embedded
hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed
at energy-efficient inference on end devices. Using the traffic speed
prediction as a case study, a vanilla LSTM model with the optimised LSTM cell
achieves 17534 inferences per second while consuming only 3.8 $\mu$J per
inference on the FPGA XC7S15 from Spartan-7 family. It achieves at least
5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than
existing approaches.
( 2
min )
The ability to construct a realistic simulator of financial exchanges,
including reproducing the dynamics of the limit order book, can give insight
into many counterfactual scenarios, such as a flash crash, a margin call, or
changes in macroeconomic outlook. In recent years, agent-based models have been
developed that reproduce many features of an exchange, as summarised by a set
of stylised facts and statistics. However, the ability to calibrate simulators
to a specific period of trading remains an open challenge. In this work, we
develop a novel approach to the calibration of market simulators by leveraging
recent advances in deep learning, specifically using neural density estimators
and embedding networks. We demonstrate that our approach is able to correctly
identify high probability parameter sets, both when applied to synthetic and
historical data, and without reliance on manually selected or weighted
ensembles of stylised facts.
( 2
min )
Normalizing flows (NF) recently gained attention as a way to construct
generative networks with exact likelihood calculation out of composable layers.
However, NF is restricted to dimension-preserving transformations. Surjection
VAE (SurVAE) has been proposed to extend NF to dimension-altering
transformations. Such networks are desirable because they are expressive and
can be precisely trained. We show that the approaches are a re-invention of PDF
projection, which appeared over twenty years earlier and is much further
developed.
( 2
min )
We present a new method that includes three key components of distributed
optimization and federated learning: variance reduction of stochastic
gradients, partial participation, and compressed communication. We prove that
the new method has optimal oracle complexity and state-of-the-art communication
complexity in the partial participation setting. Regardless of the
communication compression feature, our method successfully combines variance
reduction and partial participation: we get the optimal oracle complexity,
never need the participation of all nodes, and do not require the bounded
gradients (dissimilarity) assumption.
( 2
min )
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly
popular in financial applications, owing to certain desirable properties that
it enjoys. We consider the problem of estimating UBSR in a recursive setting,
where samples from the underlying loss distribution are available
one-at-a-time. We cast the UBSR estimation problem as a root finding problem,
and propose stochastic approximation-based estimations schemes. We derive
non-asymptotic bounds on the estimation error in the number of samples. We also
consider the problem of UBSR optimization within a parameterized class of
random variables. We propose a stochastic gradient descent based algorithm for
UBSR optimization, and derive non-asymptotic bounds on its convergence.
( 2
min )
Artificial neural networks can be represented by paths. Generated as random
walks on a dense network graph, we find that the resulting sparse networks
allow for deterministic initialization and even weights with fixed sign. Such
networks can be trained sparse from scratch, avoiding the expensive procedure
of training a dense network and compressing it afterwards. Although sparse,
weights are accessed as contiguous blocks of memory. In addition, enumerating
the paths using deterministic low discrepancy sequences, for example the Sobol'
sequence, amounts to connecting the layers of neural units by progressive
permutations, which naturally avoids bank conflicts in parallel computer
hardware. We demonstrate that the artificial neural networks generated by low
discrepancy sequences can achieve an accuracy within reach of their dense
counterparts at a much lower computational complexity.
( 2
min )
In the multi-armed bandit framework, there are two formulations that are
commonly employed to handle time-varying reward distributions: adversarial
bandit and nonstationary bandit. Although their oracles, algorithms, and regret
analysis differ significantly, we provide a unified formulation in this paper
that smoothly bridges the two as special cases. The formulation uses an oracle
that takes the best-fixed arm within time windows. Depending on the window
size, it turns into the oracle in hindsight in the adversarial bandit and
dynamic oracle in the nonstationary bandit. We provide algorithms that attain
the optimal regret with the matching lower bound.
( 2
min )
Deep neural networks (DNNs), the agents of deep learning (DL), require a
massive number of parallel/sequential operations. This makes it difficult to
comprehend DNNs' operations and impedes proper diagnosis. Without better
knowledge of their internal process, deploying DNNs in high-stakes domains can
lead to catastrophic failures. Therefore, to build more reliable DNNs/DL to be
deployed in high-stakes real-world problems, it is imperative that we gain
insights into DNNs' internal operations underlying their decision-making. Here,
we use the self-organizing map (SOM) to analyze DL models' internal codes
associated with DNNs' decision-making. Our analyses suggest that shallow layers
close to the input layer compress features into condensed space and that deep
layers close to the output layer expand feature space. We also found evidence
indicating that compressed features may underlie DNNs' vulnerabilities to
adversarial perturbations.
( 2
min )
In a high-dimensional regression framework, we study consequences of the
naive two-step procedure where first the dimension of the input variables is
reduced and second, the reduced input variables are used to predict the output
variable with kernel regression. In order to analyze the resulting regression
errors, a novel stability result for kernel regression with respect to the
Wasserstein distance is derived. This allows us to bound errors that occur when
perturbed input data is used to fit the regression function. We apply the
general stability result to principal component analysis (PCA). Exploiting
known estimates from the literature on both principal component analysis and
kernel regression, we deduce convergence rates for the two-step procedure. The
latter turns out to be particularly useful in a semi-supervised setting.
( 2
min )
Linear regression is one of the most fundamental linear algebra problems.
Given a dense matrix $A \in \mathbb{R}^{n \times d}$ and a vector $b$, the goal
is to find $x'$ such that
$ \| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2 $. The best
classical algorithm takes $O(nd) + \mathrm{poly}(d/\epsilon)$ time [Clarkson
and Woodruff STOC 2013, Nelson and Nguyen FOCS 2013]. On the other hand,
quantum linear regression algorithms can achieve exponential quantum speedups,
as shown in [Wang Phys. Rev. A 96, 012335, Kerenidis and Prakash ITCS 2017,
Chakraborty, Gily{\'e}n and Jeffery ICALP 2019]. However, the running times of
these algorithms depend on some quantum linear algebra-related parameters, such
as $\kappa(A)$, the condition number of $A$. In this work, we develop a quantum
algorithm that runs in $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) +
\mathrm{poly}(d/\epsilon)$ time. It provides a quadratic quantum speedup in $n$
over the classical lower bound without any dependence on data-dependent
parameters. In addition, we also show our result can be generalized to multiple
regression and ridge linear regression.
( 2
min )
Mini-EUSO is a wide-angle fluorescence telescope that registers ultraviolet
(UV) radiation in the nocturnal atmosphere of Earth from the International
Space Station. Meteors are among multiple phenomena that manifest themselves
not only in the visible range but also in the UV. We present two simple
artificial neural networks that allow for recognizing meteor signals in the
Mini-EUSO data with high accuracy in terms of a binary classification problem.
We expect that similar architectures can be effectively used for signal
recognition in other fluorescence telescopes, regardless of the nature of the
signal. Due to their simplicity, the networks can be implemented in onboard
electronics of future orbital or balloon experiments.
( 3
min )
This document describes an approach used in the Multi-Machine Disruption
Prediction Challenge for Fusion Energy by ITU, a data science competition which
ran from September to November 2023, on the online platform Zindi. The
competition involved data from three fusion devices - C-Mod, HL-2A, and J-TEXT
- with most of the training data coming from the last two, and the test data
coming from the first one. Each device has multiple diagnostics and signals,
and it turns out that a critical issue in this competition was to identify
which signals, and especially which features from those signals, were most
relevant to achieve accurate predictions. The approach described here is based
on extracting features from signals, and then applying logistic regression on
top of those features. Each signal is treated as a separate predictor and, in
the end, a combination of such predictors achieved the first place on the
leaderboard.
( 2
min )
On dedicated analog hardware, equilibrium propagation is an energy-efficient
alternative to backpropagation. In spite of its theoretical guarantees, its
application in the AI domain remains limited to the discriminative setting.
Meanwhile, despite its high computational demands, generative AI is on the
rise. In this paper, we demonstrate the application of Equilibrium Propagation
in training a variational autoencoder (VAE) for generative modeling. Leveraging
the symmetric nature of Hopfield networks, we propose using a single model to
serve as both the encoder and decoder which could effectively halve the
required chip size for VAE implementations, paving the way for more efficient
analog hardware configurations.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).
( 2
min )
A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or
distribution shift, between the dataset and the distribution over states and
actions visited by the learned policy. This problem is exacerbated in the fully
offline setting. The main approach to correct this shift has been through
importance sampling, which leads to high-variance gradients. Other approaches,
such as conservatism or behavior-regularization, regularize the policy at the
cost of performance. In this paper, we propose a new approach for stable
off-policy Q-Learning. Our method, Projected Off-Policy Q-Learning (POP-QL), is
a novel actor-critic algorithm that simultaneously reweights off-policy samples
and constrains the policy to prevent divergence and reduce value-approximation
error. In our experiments, POP-QL not only shows competitive performance on
standard benchmarks, but also out-performs competing methods in tasks where the
data-collection policy is significantly sub-optimal.
( 2
min )
Foundation models, specifically Large Language Models (LLM's), have lately
gained wide-spread attention and adoption. Reinforcement Learning with Human
Feedback (RLHF) involves training a reward model to capture desired behaviors,
which is then used to align an LLM. These reward models are additionally used
at inference-time to estimate how well LLM responses adhere to those desired
behaviors. However, there is little work measuring how robust these reward
models are to distribution shifts. In this work, we evaluate how reward model
performance - measured via accuracy and calibration (i.e. alignment between
accuracy and confidence) - is affected by distribution shift. We show novel
calibration patterns and accuracy drops due to OOD prompts and responses, and
that the reward model is more sensitive to shifts in responses than prompts.
Additionally, we adapt an OOD detection technique commonly used in
classification to the reward model setting in order to detect these
distribution shifts in prompts and responses.
( 2
min )
In this research, we developed a graph-based framework to represent various
aspects of optimal thermal management system design, with the aim of rapidly
and efficiently identifying optimal design candidates. Initially, the
graph-based framework is utilized to generate diverse thermal management system
architectures. The dynamics of these system architectures are modeled under
various loading conditions, and an open-loop optimal controller is employed to
determine each system's optimal performance. These modeled cases constitute the
dataset, with the corresponding optimal performance values serving as the
labels for the data. In the subsequent step, a Graph Neural Network (GNN) model
is trained on 30% of the labeled data to predict the systems' performance,
effectively addressing a regression problem. Utilizing this trained model, we
estimate the performance values for the remaining 70% of the data, which serves
as the test set. In the third step, the predicted performance values are
employed to rank the test data, facilitating prioritized evaluation of the
design scenarios. Specifically, a small subset of the test data with the
highest estimated ranks undergoes evaluation via the open-loop optimal control
solver. This targeted approach concentrates on evaluating higher-ranked designs
identified by the GNN, replacing the exhaustive search (enumeration-based) of
all design cases. The results demonstrate a significant average reduction of
over 92% in the number of system dynamic modeling and optimal control analyses
required to identify optimal design scenarios.
( 3
min )
Since no solutions have been proposed in Colombia that seek to reduce the
consumption of electricity at the residential level, this paper describes the
design and implementation of a simple prototype of a low-cost home energy
management system (HEMS). The objective of this plat-form is to monitor the
energy consumption of typical household devices so that users can access the
consumption of each device separately and then establish the strategy that
allows them to reduce energy consumption at home. In order to demonstrate that
our system is viable, the system has been evaluated by measuring weekly energy
consumption with the on-line and off-line HEMS using a test bench with typical
household devices in a Sincelejo typical household. The evaluation has shown
that with the installation of this HEMS, consumption is reduced by 27%. This
shows that it is possible to achieve a good reduction percentage with a
low-cost system.
( 2
min )
This paper investigates an approach to both speed up business decision-making
and lower the cost of learning through experimentation by factorizing business
policies and employing fractional factorial experimental designs for their
evaluation. We illustrate how this method integrates with advances in the
estimation of heterogeneous treatment effects, elaborating on its advantages
and foundational assumptions. We empirically demonstrate the implementation and
benefits of our approach and assess its validity in evaluating consumer
promotion policies at DoorDash, which is one of the largest delivery platforms
in the US. Our approach discovers a policy with 5% incremental profit at 67%
lower implementation cost.
( 2
min )
There is growing concern that the potential of black box AI may exacerbate
health-related disparities and biases such as gender and ethnicity in clinical
decision-making. Biased decisions can arise from data availability and
collection processes, as well as from the underlying confounding effects of the
protected attributes themselves. This work proposes a machine learning-based
orthogonal approach aiming to analyze and suppress the effect of the confounder
through discriminant dimensionality reduction and orthogonalization of the
protected attributes against the primary attribute information. By doing so,
the impact of the protected attributes on disease diagnosis can be realized,
undesirable feature correlations can be mitigated, and the model prediction
performance can be enhanced.
( 2
min )
With the rise of Large Language Models (LLMs), notably characterized by GPT
frameworks, there emerges a catalyst for novel healthcare applications. Earlier
iterations of chatbot caregivers, though existent, have yet to achieve a
dimension of human-like authenticity. This paper unveils `MemoryCompanion' a
pioneering digital health solution explicitly tailored for Alzheimer's disease
(AD) patients and their caregivers. Drawing upon the nuances of GPT technology
and prompt engineering, MemoryCompanion manifests a personalized caregiving
paradigm, fostering interactions via voice-cloning and talking-face mechanisms
that resonate with the familiarity of known companions. Using advanced
prompt-engineering, the system intricately adapts to each patient's distinct
profile, curating its content and communication style accordingly. This
approach strives to counteract prevalent issues of social isolation and
loneliness frequently observed in AD demographics. Our methodology, grounded in
its innovative design, addresses both the caregiving and technological
challenges intrinsic to this domain.
( 2
min )
In this work, we present a method to generate a configurational level
fingerprint for polymers using the Bead-Spring-Model. Unlike some of the
previous fingerprinting approaches that employ monomer-level information where
atomistic descriptors are computed using quantum chemistry calculations, this
approach incorporates configurational information from a coarse-grained model
of a long polymer chain. The proposed approach may be advantageous for the
study of behavior resulting from large molecular weights. To create this
fingerprint, we make use of two kinds of descriptors. First, we calculate
certain geometric descriptors like Re2, Rg2 etc. and label them as Calculated
Descriptors. Second, we generate a set of data-driven descriptors using an
unsupervised autoencoder model and call them Learnt Descriptors. Using a
combination of both of them, we are able to learn mappings from the structure
to various properties of the polymer chain by training ML models. We test our
fingerprint to predict the probability of occurrence of a configuration at
equilibrium, which is approximated by a simple linear relationship between the
instantaneous internal energy and equilibrium average internal energy.
( 2
min )
Through the advancement in natural language processing (NLP), specifically in
speech recognition, fully automated complex systems functioning on voice input
have started proliferating in areas such as home automation. These systems have
been termed Automatic Speech Recognition Systems (ASR). In this review paper,
we explore the feasibility of an end-to-end system providing speech and text
based natural language processing for job interview preparation as well as
recommendation of relevant job postings. We also explore existing
recommender-based systems and note their limitations. This literature review
would help us identify the approaches and limitations of the various similar
use-cases of NLP technology for our upcoming project.
( 2
min )
Amazon Web Services and NVIDIA will bring the latest generative AI technologies to enterprises worldwide. Combining AI and cloud computing, NVIDIA founder and CEO Jensen Huang joined AWS CEO Adam Selipsky Tuesday on stage at AWS re:Invent 2023 at the Venetian Expo Center in Las Vegas. Selipsky said he was “thrilled” to announce the expansion Read article >
( 6
min )
Researchers and developers at leading pharmaceutical and techbio companies can now easily deploy NVIDIA Clara software and services for accelerated healthcare through Amazon Web Services. Announced today at AWS re:Invent, the initiative gives healthcare and life sciences developers using AWS cloud resources the flexibility to integrate NVIDIA-accelerated offerings such as NVIDIA BioNeMo — a generative Read article >
( 6
min )
Developing more intelligent robots in the cloud is about to get a speed multiplier. NVIDIA Isaac Sim and NVIDIA L40S GPUs are coming to Amazon Web Services, enabling developers to build and deploy accelerated robotics applications in the cloud. Isaac Sim, an extensible simulator for AI-enabled robots, is built on the NVIDIA Omniverse development platform Read article >
( 6
min )
Everything about large language models is big — giant models train on massive datasets across thousands of NVIDIA GPUs. That can pose a lot of big challenges for companies pursuing generative AI. NVIDIA NeMo, a framework for building, customizing and running LLMs, helps overcome these challenges. A team of experienced scientists and developers at Amazon Read article >
( 5
min )
This week’s talented In the NVIDIA Studio artist, Nourhan Ismail, created a literal NVIDIA studio.
( 7
min )
The immediate and pressing need for ‘digitizing’ your supply-chain One may conclude: ‘Digitizing’ the supply-chain has become a survival necessity for companies to stay competitive. Apart from a substantial jump in the efficiency-effectiveness, the customer-experience, and upside to revenues, companies can expect a huge-huge cost-saving… A Look at the Future: Components of Data-driven (Digital) Supply-chain… Read More »Data-driven, AI-powered supply chain part 3: Imagining the Future – Supply chain 5.0
The post Data-driven, AI-powered supply chain part 3: Imagining the Future – Supply chain 5.0 appeared first on Data Science Central.
( 25
min )
The viability of the ‘Viable Vision’. I did hear about the Theory of Constraints (TOC) off and on through the late 90s, but I didn’t pay much attention until late 2001. One of the i2 consultants I met at their annual meet in Malaysia had one too many- and ended up lecturing me on how… Read More »Data-driven supply chain part 2: The theory of constraints & the concept of the information supply chain.
The post Data-driven supply chain part 2: The theory of constraints & the concept of the information supply chain. appeared first on Data Science Central.
( 28
min )
While the world is going wild over the potential benefits of generative AI, there’s little attention paid to the data deployed to build and operate these tools. Let’s look at a few examples to explore what’s involved in determining data use, and why this matters for end users as well as operators. Text-based generative AI… Read More »Here’s How Much Data Gets Used By Generative AI Tools For Each Request
The post Here’s How Much Data Gets Used By Generative AI Tools For Each Request appeared first on Data Science Central.
( 21
min )
Earlier in the fall, Charles Hoffman joined our non-profit Dataworthy Collective (DC) that focuses on best practices in trusted knowledge graph development. Hoffman is a CPA, consultant and former PwC auditor who works with clients who use the Extensible Business Reporting Language (XBRL). For those who don’t know the history of standard digital business reporting,… Read More »Trusted, automated data sharing across spreadsheets and other documents
The post Trusted, automated data sharing across spreadsheets and other documents appeared first on Data Science Central.
( 20
min )
Learning unsupervised world models for autonomous driving has the potential
to improve the reasoning capabilities of today's systems dramatically. However,
most work neglects the physical attributes of the world and focuses on sensor
data alone. We propose MUVO, a MUltimodal World Model with Geometric VOxel
Representations to address this challenge. We utilize raw camera and lidar data
to learn a sensor-agnostic geometric representation of the world, which can
directly be used by downstream tasks, such as planning. We demonstrate
multimodal future predictions and show that our geometric representation
improves the prediction quality of both camera images and lidar point clouds.
( 2
min )
In echocardiographic view classification, accurately detecting
out-of-distribution (OOD) data is essential but challenging, especially given
the subtle differences between in-distribution and OOD data. While conventional
OOD detection methods, such as Mahalanobis distance (MD) are effective in
far-OOD scenarios with clear distinctions between distributions, they struggle
to discern the less obvious variations characteristic of echocardiographic
data. In this study, we introduce a novel use of label smoothing to enhance
semantic feature representation in echocardiographic images, demonstrating that
these enriched semantic features are key for significantly improving near-OOD
instance detection. By combining label smoothing with MD-based OOD detection,
we establish a new benchmark for accuracy in echocardiographic OOD detection.
( 2
min )
Tabular data is hard to acquire and is subject to missing values. This paper
proposes a novel approach to generate and impute mixed-type (continuous and
categorical) tabular data using score-based diffusion and conditional flow
matching. Contrary to previous work that relies on neural networks to learn the
score function or the vector field, we instead rely on XGBoost, a popular
Gradient-Boosted Tree (GBT) method. We empirically show on 27 different
datasets that our approach i) generates highly realistic synthetic data when
the training dataset is either clean or tainted by missing data and ii)
generates diverse plausible data imputations. Furthermore, our method
outperforms deep-learning generation methods on data generation and is
competitive on data imputation. Finally, it can be trained in parallel using
CPUs without the need for a GPU. To make it easily accessible, we release our
code through a Python library and an R package.
( 2
min )
A common forecasting setting in real world applications considers a set of
possibly heterogeneous time series of the same domain. Due to different
properties of each time series such as length, obtaining forecasts for each
individual time series in a straight-forward way is challenging. This paper
proposes a general framework utilizing a similarity measure in Dynamic Time
Warping to find similar time series to build neighborhoods in a k-Nearest
Neighbor fashion, and improve forecasts of possibly simple models by averaging.
Several ways of performing the averaging are suggested, and theoretical
arguments underline the usefulness of averaging for forecasting. Additionally,
diagnostics tools are proposed allowing a deep understanding of the procedure.
( 2
min )
Recent results show that estimates defined by over-parametrized deep neural
networks learned by applying gradient descent to a regularized empirical $L_2$
risk are universally consistent and achieve good rates of convergence. In this
paper, we show that the regularization term is not necessary to obtain similar
results. In the case of a suitably chosen initialization of the network, a
suitable number of gradient descent steps, and a suitable step size we show
that an estimate without a regularization term is universally consistent for
bounded predictor variables. Additionally, we show that if the regression
function is H\"older smooth with H\"older exponent $1/2 \leq p \leq 1$, the
$L_2$ error converges to zero with a convergence rate of approximately
$n^{-1/(1+d)}$. Furthermore, in case of an interaction model, where the
regression function consists of a sum of H\"older smooth functions with $d^*$
components, a rate of convergence is derived which does not depend on the input
dimension $d$.
( 2
min )
Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing portfolio offers the broadest choice of accelerators to power your artificial intelligence (AI), machine learning (ML), graphics, and high performance computing (HPC) workloads. We are excited to announce the expansion of this portfolio with three new instances featuring the latest NVIDIA GPUs: Amazon EC2 P5e instances powered […]
( 4
min )
Today, Amazon SageMaker launches a new version (0.25.0) of Large Model Inference (LMI) Deep Learning Containers (DLCs) and adds support for NVIDIA’s TensorRT-LLM Library. With these upgrades, you can effortlessly access state-of-the-art tooling to optimize large language models (LLMs) on SageMaker and achieve price-performance benefits – Amazon SageMaker LMI TensorRT-LLM DLC reduces latency by 33% […]
( 9
min )
Generative artificial intelligence (generative AI) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Most real-world data exists in unstructured formats like PDFs, which requires preprocessing before it can be used effectively. According to IDC, […]
( 10
min )
This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. This is the third post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. In Part 1 and Part 2, we show how the Salesforce Data Cloud and Einstein Studio integration with SageMaker allows businesses to access their […]
( 12
min )
Artificial intelligence (AI) continues to transform how we do business and serve our customers. AWS offers a range of pre-trained AI services that provide ready-to-use intelligence for your applications. In this post, we explore the new AI service capabilities and how they are enhanced using foundation models (FMs). We focus on the following major updates […]
( 7
min )
In this post, we talk about how generative AI is changing the conversational AI industry by providing new customer and bot builder experiences, and the new features in Amazon Lex that take advantage of these advances. As the demand for conversational AI continues to grow, developers are seeking ways to enhance their chatbots with human-like […]
( 7
min )
Human Guided Exploration (HuGE) enables AI agents to learn quickly with some help from humans, even if the humans make mistakes.
( 11
min )
Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that makes it straightforward for you to add speech-to-text capabilities to your applications. Today, we are happy to announce a next-generation multi-billion parameter speech foundation model-powered system that expands automatic speech recognition to over 100 languages. In this post, we discuss some of the […]
( 7
min )
Today, we are excited to announce three launches that will help you enhance personalized customer experiences using Amazon Personalize and generative AI. Whether you’re looking for a managed solution or build your own, you can use these new capabilities to power your journey. Amazon Personalize is a fully managed machine learning (ML) service that makes […]
( 8
min )
Amazon Personalize is excited to announce the new Next Best Action (aws-next-best-action) recipe to help you determine the best actions to suggest to your individual users that will enable you to increase brand loyalty and conversion. Amazon Personalize is a fully managed machine learning (ML) service that makes it effortless for developers to deliver highly […]
( 8
min )
NVIDIA today launched a cloud service for medical imaging AI to further streamline and accelerate the creation of ground-truth data and training of specialized AI models through fully managed, cloud-based application programming interfaces. NVIDIA MONAI cloud APIs — announced at the annual meeting of RSNA, the Radiological Society of North America, taking place this week Read article >
( 7
min )
This post is co-written with Marc Neumann, Amor Steinberg and Marinus Krommenhoek from BMW Group. The BMW Group – headquartered in Munich, Germany – is driven by 149,000 employees worldwide and manufactures in over 30 production and assembly facilities across 15 countries. Today, the BMW Group is the world’s leading manufacturer of premium automobiles and […]
( 11
min )
In today’s ever-evolving world of ecommerce, the influence of a compelling product description cannot be overstated. It can be the decisive factor that turns a potential visitor into a paying customer or sends them clicking off to a competitor’s site. The manual creation of these descriptions across a vast array of products is a labor-intensive […]
( 9
min )
Amazon SageMaker Canvas is a rich, no-code Machine Learning (ML) and Generative AI workspace that has allowed customers all over the world to more easily adopt ML technologies to solve old and new challenges thanks to its visual, no-code interface. It does so by covering the ML workflow end-to-end: whether you’re looking for powerful data […]
( 9
min )
This post was co-written with Greg Benson, Chief Scientist; Aaron Kesler, Sr. Product Manager; and Rich Dill, Enterprise Solutions Architect from SnapLogic. Many customers are building generative AI apps on Amazon Bedrock and Amazon CodeWhisperer to create code artifacts based on natural language. This use case highlights how large language models (LLMs) are able to […]
( 17
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )